Recent Posts

Raw data as oxymoron

1 minute read

"Raw data is both an oxymoron and a bad idea; to the contrary, data should be cooked with care."
- Geoffrey Bowker Bowker was spot on in his comments made last week at Columbia Journalism School. I can't tell you how many times I've had to make order out of chaos from "raw data," i.e. unintelligible, inaccurate spreadsheets.

Using data-viz to make a wire story stand out from the pack

13 minute read

I've been interested lately in finding examples of online-only, collaborative, non-profit newsrooms who've utilized the power of data visualization techniques to give added value to stories that otherwise wouldn't necessarily be unique, and in doing so beat out legacy news organizations who published a text narrative alone. Take, for example, this data-rich story and interactive map displaying statewide testing results published by NJSpotlight Friday. While the news that only 8 out of 10 graduating seniors had passed New Jersey's current standardized test in 2011 was widely reported across the state last week, including by the Star-Ledger in Newark and by The Press of Atlantic City, only NJSpotlight took advantage of the story's strong data element to produce a more concise, data-driven visual narrative.

So NJSpotlight obviously deserves kudos for the gap they're filling in New Jersey journalism. What's more, the job they did on the interactive map was fairly sophisticated (I've still yet to figure out how to overlay such a highly customized legend onto a Google map). But as is always the case with any deadline project, there's room for improvement. Let's take a look at the good and the bad of NJSpotlight's Friday package on state test results.

First off, this is a classic example of a story where county and/or municipality polygons with a colored fill layered on top of a Google Map brings new insight to a widely reported story. Not only can we immediately see from the map that most of the state fell within the 75 to 90 percent range in terms of passing rates, but we can also clearly tell that the north-central region of the state, particularly near the Pennsylyvania border, earned significantly higher scores than the rest of New Jersey. The pop-up table with information on vocational and charter schools also adds an additional layer of nuance to the piece, and does a decent job of displaying the numbers in a table-like format.

The map colors follow a somewhat logical pattern, with green representing high-passing rates and red representing lower passing rates. But the orange, yellow and blue colors that fall in the middle do more to obfuscate than they do to help visualize. Without referencing the legend, how is the user supposed to know that blue is better than yellow, or that green is better than blue? What's more, the combination of such bright shades of colors from opposite ends of the color spectrum makes the map less pleasing on the eye than it would've been had the designer chosen more subtle, complimentary shades. I understand the desire to have red represent 'negative' and green represent 'positive,' but if NJSpotlight had went with a graduated color scale from red to green, with neutral-based mid-values such as the ones I used in this recent map on nationwide obesity rates, the map would've been not only aesthetically more appealing, but easier to read.

Another minor criticism I could make of the map is the designer's decision to set the polygon-fills to solid colors with 100 percent opacity. This obscures the underlying view of which cities and townships each shape contains. While I understand the thought-process that likely went into this decision – it's hard to make disparate colors, especially orange and yellow, stand out when layered on a slightly orange-tinted map with road and highway features – the designer could have easily used a tool such as to generate some nice equidistant colors that would've looked fine set to an opacity of about 50 or 60 percent.

For the sake of thoroughness, I also want to address the included table that contains the charter and vocational school data. I like the fact that NJSpotlight chose to alternate the shades of background color for every other row in the table. It helps distinguish each row more easily from its neighbor, and gives an extra visual attribute to what easily could have been a stale grid. My only recommendations might be to split up every five or six rows with a dividing line, and to incorporate a bar chart, if possible, for one of the values in the cell (presumably, the most important value, the percent passing column).

Overlaying a bubble chart onto a Google map

25 minute read

Others may hate, but I'm a big fan of using bubbles to display data. When implemented correctly (i.e. scaled in terms of area instead of diameter), bubbles can be an aesthetically appealing and concise way to represent the value of data points in an inherently visual format. Bubbles are even more useful when they include interactivity, with events like mouseover and zoom allowing users to drill down and compare similar-sized bubbles more easily than they can in static graphics. So, when I was recently working on a class project on autism diagnoses in New York City, I decided to use bubbles to represent the percentage of students with individualized education plans at all 1250 or so K-8 New York City schools.

Almost by default, I turned to Google Maps JavaScript API V3, mainly because I'm quasi-familiar with its basic functions and event handlers (as I point out later in  this post, I didn't realize that a nifty new service called CartoDB would have automated most of what I was trying to do, albeit without nearly the level of customization). Nonetheless, based on a tutorial from Karl Agius, as well as some infoWindow help from my data viz professor, Susan McGregor, I created the following interactive bubble map of NYC schools based upon the number of special needs, or IEP, students at each school. The larger the bubble, the greater percentage of special needs students a school has. Click here to see the map full-screen, or here to download a .zip of my source files for your own customization purposes.

Each bubble on this map represents one of New York City's approximately 1,250 K-8 public schools, including charters. The larger the bubble, the higher the percentage of students with individualized education plans (IEP). Click on a bubble to find out more about the school, or click anywhere within a district boundary to see an overall average IEP rate for the district. Zoom and pan to see other parts of the city.

You'll notice the opacity for the bubbles is set to 40 percent. This allows us to get a quick visual of the locations with the highest density of special needs students, given that those areas on the map will naturally be darker because they have multiple semi-opaque layers that overlap. Setting a low opacity also prevents overlapping bubbles from covering up one another. You'll also notice that the map includes polygons for each school district, which you can click on to get an average IEP rate for the entire district. I decided against setting gradated fill colors for the school district shapes so as to avoid implying causation, as well as to lessen the visual clutter.*

Preparing the data

To create the map, I first had to download the underlying data from the New York City Department of Education database as a .csv, then import it into Excel to clean it up and leave only the relevant information. Although the dataset I had only included street addresses split into multiple columns, I was able to use the concatenate function in Excel to merge the street, city, state and zip columns to get a full street address. From there, I used my favorite batch geocoding service to convert the addresses into geographic coordinates that the Google Maps API can read. Check out my resulting .csv file here for an example. Then I imported the .csv into a Google Spreadsheet, and pasted the resulting spreadsheet's URL into dataSourceURL field in the JavaScript of my main index.html file. Here's how that looked in my code:

var dataSourceUrl = "";
var onlyInfoWindow;

Why calculus matters when it comes to data-driven stories

7 minute read

A quick refresher from my data visualization professor here at Columbia a couple of weeks ago reminded me why I was forced to spend all those grueling hours calculating standard deviation back in high school.

See, when you're using a data set to tell a story, the first step is to understand what that data says. And to do that, you've got to have a good idea of the range and variation of the values at hand. Not only can figuring that information out help you determine whether there's any statistical significance to your data set, but it can also pinpoint outliers and possible errors that may exist within the data before you begin the work of visualizing it.

Thanks to powerful processing programs like Excel, we can figure out the variability of data sets pretty easily using the program's built-in standard deviation function (remember this intimidating-looking equation from calculus class?). Still, it always helps to know how to calculate the information out by hand, if only to get a conceptual idea of why numbers such as the standard deviation (the variability of a data-set) and the z-value (the number of standard deviations a given value is away from the mean) even matter in the first place when it comes to data visualization.

So, to brush up on my formulas and also better understand the numbers behind an actual story assignment for one of my classes, I recently hand-calculated the standard deviation and z-values for a set of data on state-by-state obesity rates. From my calculations, I was able to use the standard deviation (3.24) to determine that, on average, most states fell within the middle of the bell-curve for the average national obesity rate (27.1 percent) . In addition, the z-values helped me understand which states stood out from the pack as possible outliers (Mississippi is by far the most obese with a 2.13 z-value, Colorado the least obese with a -1.9 z-value). To get an idea of how those formulas look hand-calculated in Excel, check out my spreadsheet here. And keep these formulas in mind while working on your next data story. They can potentially save you time and effort by helping you figure out what your data set says before you have to go through the often-lengthy process of visualizing it.

What makes “the world’s best designed website”

16 minute read

With the Pulitzer Price announcements coming up later this afternoon, you'd think I'd be writing about whose up for the "Best Deadline Reporting" or "Best Public Service Journalism" prizes. But instead I want to talk about a different media award doled out during the past week:'s designation as the "world's best designed website" by the Society for News Design. Put simply, I can't say I disagree.

Yet before I divulge in my effusive praise of the folks in Boston, let me point out that I'm still somewhat skeptical of the business-logic behind The Globe's decision to launch its paywall-only site last fall alongside its primary news portal, From a revenue standpoint, I can't see the full-paywall site,, bringing in nearly enough subscription income to compensate for the traffic and ad dollars it will, and to some degree already has, leeched from (at last count, in February, had only recruited about 16,000 paid digital subscribers, many of whom had taken advantage of the site's introductory offer of 99 cents for the first four weeks but may not stick it out past the trial period). It might've been wiser for the paper to start out with a metered paywall to warm users to the idea of paying for content before erecting a full-blown ten-foot-tall paywall around its most valuable content under a new, separate domain name. But then again, who really knows – maybe it's a step in the right direction longterm? I'm not the one having to make the tough calls, so I'm in no position to judge.

At any rate, the segregation of the two sites has given the paper's parent company the freedom to make a rich, immersive and interactive user experience that few other news organizations can match. Why, you ask? Because of a little trend gaining steam in the development world called "responsive design." See, isn't only a standout site because of its sophisticated use of white space, its wholehearted embrace of web fonts and its visual-first approach to story art, but also because of its cross-platform capabilities. No matter what device you view on – desktop, tablet or mobile – the site retains its same slick look and feel. The site's adaptive technology allows it to detect the screen size of the user, then adjust its layout to fit the exact resolution at hand. On desktops, that means content stretches to fill the entire browser window, and the grid restructures itself appropriately when you decrease or increase the size of the window. This eliminates the need for time-honored design standards like the 960 grid-system, which is based upon the Desktop-centric idea that all users have at least a 960px resolution. Now, the grid can be as big, or as small, as the user wants it to be.

The site's successful display of a dynamic range of real-time content on any-sized device also essentially eliminates the need for the "app-based" environment that's been the staple of the iOS5 and Android operating systems in recent years. To test it out, I gave the site a whirl on my iPad, and was pleased to see the front page neatly rearrange itself to fit the new orientation, just as any of the best platform-specific news apps out there do, including The New Yorker and Wired apps for iPad. But just because doesn't need a native app to display its content beautifully on tablet and mobile devices doesn't mean that it won't be missing out on the growing app-centric marketplace for publishers. As Apple and Google continue to centralize digital consumption patterns into the app-based model, may be a little too ambitious in thinking it doesn't need to play nice with the big-boy tech companies to reach its audience. On the other hand, however, it could prove to be a brilliant move for the paper, setting an example for other publishers and app-based companies in general who want to break free from the often constricting, not to mention pricey, cost of participating in the app marketplace.

For its design alone, deserves any number of awards. Its bold, minimalist interface allows content to shine above all else, free from clutter and distraction. And with high-res horizontal photos, block-quotes, inline video and stylized headlines that grab your attention without hurting your eyes, that content stands out even more. What's more, the site has the functionality to back up its aesthetics. Its "Save" feature allows users to bookmark articles for reading offline on any device, which, even for sites without responsive design, is a brilliant feature for a news site to implement. Moreover, its "Story Flow" panel at the bottom of each article allows users to click seamlessly through to stories on similar subjects, just as readers would in an old-fashioned newspaper with physical sections. Which leaves me with one nagging question: Is too imitative of physical newspapers to attain success in today's short attention-span digital audience? Is it too skewmorphic to succeed in an SEO-driven, hyperlinked news economy?

I guess we'll have to wait and see.

Critique: “Agreement Groups in the United States Senate”

6 minute read

Take a look at this fascinating visualization of U.S. senate agreement groups made by Ph.D. student Adrian Friggeri. Using a complex agreement algorithim based upon data from, the visualization displays how much all 100 senators of each U.S. Congress during the last 15 years have crossed the aisle –– or stuck to party lines –– on senate-floor votes.

From a design standpoint, the visualization is nearly flawless. The thin red and blue lines help the user form an instant party association, and the light gray bars in the background distinguish each Congress from the next without leading to visual clutter. What's perhaps most impressive is that, despite the fact that the visualization contains far more than 100 different data points, the information is still fairly easy to access and the interface is stil simplistic in feel. Because each Senator's entire individual trajectory is highlighted on mouseover, users can get a glimpse at how willing their respective Senator has been to negotiate a compromise across party lines over the years.

Most of all, the visualization does what all good visualizations should do: tells a story without text. As we can see, the number of Democrats who have crossed the aisle is notably larger than that of their GOP counterparts. This becomes ever more clear when we drill down to look at each party's trajectory individually, where the connections can be seen more clearly. Perhaps what I would've liked to have seen in addition, however, is some sort of summary or average value of the disparity between the two parties on agreement rates, even if just a number at the bottom of the visualization. As it stands, the user has to dissect the visualization a good bit to tell that Democrats have a higher "agreement rate" than Republicans.

The networked line structure reminds me a lot of the Wall Street Journal's "What They Know" visualization, except that this visualization has a good bit less clutter and complexity, and much better styling choices.

Response to Norman, “Emotional Design”

5 minute read

Good aesthetics are more than just fluff when it comes to design. They are a core part of a product's functionality. Such is the argument Donald A. Norman makes in his insightful 2005 book Emotional Design: Why We Love (or Hate) Everyday Things. For Norman, attractive things work better by boosting the mood of the user and therefore allowing him or her to think more clearly and operate it more efficiently.

Undergirding Norman's thesis that aesthetics directly influence operability is his distinction between the three basic levels of human cognition: the visceral (jumping at a sudden sound in a quiet room), the behavioral (relaxing in the solitude of a quiet room) and the reflective (thinking to oneself about why a quiet room is more enjoyable). As Norman asserts, these three levels of thought processing "interact with one another, each modulating the others" (7). You cannot escape the effect that one level of thought processing has on the other. As such, a visceral reaction to an external stimuli influences the subsequent behavioral reactions we have, which in turn influence our reflective conclusions about the stimuli itself. If we have a negative visceral reaction to a poorly design website, our mood is negatively affected in such a way that hinders our ability to navigate and use the site, even if there's nothing wrong with the navigation or user interface from a technical standpoint. All our brain can focus on is the poor design. This reaction is similar to the way humans form first impressions of others; if an individual makes a poor first impression (a visceral reaction), we are less likely to act on his or her future actions or speech (a behavioral reaction), which in turn affects the entire way we think about that person (a reflective reaction).

Respons to Saffer, “Designing for Interaction”

7 minute read

Interactive designer Don Saffer artfully captures both the practical and the theoretical aspects of his profession in his 2006 book Designing for Interaction: Creating Smart Applications for Clever Devices. From its title, Saffer's book may sound like a simple "how-to" guide to creating web apps with interactivity. Yet while it is certainly that to an extent, the book is more broadly a treatise and exploration of the ideology and terminology behind interactive design.

Saffer sets out to answer seemingly simple questions such as "What is interaction design?" and "What is the value of interaction design?" with thoughtful, reflective analyses. The principle purpose of interaction design, he argues, is "its application to real problems" and its ability to "solve specific problems under a specific set of circumstances" (5). As such, interaction design is inherently attached to the physical world, and is "by its nature contextual," changing and evolving in its definition over the course of time and space (4). Paradoxically, however, Saffer argues that the core principles of good interaction design are "technologically agnostic," and don't change along with ebbs and flows of technological innovation: "Since technology frequently changes, good interaction design doesn't align itself to any one technology or medium in particular" (7). How can Saffer simultaneoulsy assert that interaction design is tethered to its particular context in time, while in the same breath arguing that it remains unchanging in its core values? Don't the two statements on some level contradict one another? Saffer would likely respond by saying that only the "principles" behind interaction design – helping people communicate with one another and, to a lesser degree, with computers – remain constant amid technological upheaval. But this view verges on downplaying the power of future technological change to fundamentally alter every known aspect of the way we communicate. What if, in ten years, a technology comes along that automates communication in such a way that leads to a paradigmatic shift in the role of the interaction designer? Although Saffer is correct in his assertion that even the rise of the Internet has not so far altered the core principles of interaction design, that doesn't mean that such constancy will always be the case.

On Richard Boarman’s “Bubble Trees: The Visualization of Hierarchical Structure”

3 minute read

In his brief two-page paper "Bubble Trees: The Visualization of Hierarchical Structure," Richard Boardman proposes a new type of interactive presentation of hierarchical data that he calls the bubble tree. To bolster his argument, Boardman points out the difficulties inherent in the traditional "tree" structure, which suffers from the "breadth versus depth" problem by leading to information overload and taking up too much screen real estate. As a solution, he proposes a clickable bubble tree that leads to child and grandchild bubbles. Because of its interactive nature and nested structure, Boardman's bubble tree would "naturally allow the user to explore and work out relationships for themselves," he says.

Since the publication of Boardman's paper, this style of bubbletree has become somewhat ad nouveau in the information design community, with popular JavaScript libraries such as Bubbletree.js putting the creation of complex, hierarchical bubble trees into the hands of the general web development public. As its popular use has demonstrated, Bubbletree.js can be particularly handy when it comes to displaying Open Spending data.