This is the first of two tutorials on adding hover interactions to interactive maps using free, open-source tools.
Today I wanted to break down the step-by-step process of how I created this interactive, chloropleth map of Georgia population change from 2010 to 2011. While today's tutorial doesn't cover how to add hover states or moving tooltips to your maps, such as those you see in this iteration of the same map made using CartoDB+Leaflet, I'll cover how to add those more advanced features in a later post (they require using a few more tools and doing a bit more programming). For now, though, we're going to stick to the basics of simple hover interactions. Here's what you'll need to follow along:
1. Gathering and preparing the data
The first thing you'll need to do is locate your data and parse it down to a format in which it can be easily visualized. For this example, we're using population estimates from the U.S. Census Bureau, which, luckily for us, has already done most of the heavy lifting analysis-wise. Download the data for Georgia or whatever other state you wish to visualize as a comma separated value (.csv) file from this page. Now you'll want to turn that .csv file into a Google spreadsheet. You can do this from the Google Drive dashboard by selecting Create>Spreadsheet, then choosing File>Import once your new spreadsheet opens. Locate and upload the .csv file. Select "Replace current spreadsheet" and set "Comma" as the separator character. Voila. Your data should then appear for each county. Here's what the correct options for that will look like:
Given that each county has a different base population, the only standar way to compare en masse how many residents each county gained or lost is by calculating the percent change (see here to figure out how to calculate that in Excel or Google Spreadsheets). In this example, however, the Census Bureau has already calculated out the percent change for us, making our job that much easier. Delete all the unnecessary columns from the spreadsheet, leaving only the county name in the first column, the 2010 and 2011 population totals in the second and third columns, and the "Percent" value in the fourth column (the "Percent" value refers to the percent change between the two years). Also make sure to delete any extra rows at the top or bottom of the spreadsheet, so that the first row contains the column titles, with the first county beginning on the second row and ending on the last row. Highlight the column containing the county names and select “Data>Sort sheet by column, A-Z." This will put the entries in alphabetical order.
If you happen to get a weird period ('.') preceding each county name as I did, you can get rid of it pretty easily by performing the following steps:
- Insert a new column to the right and, assuming your first county name begins in cell A2, enter the following formula into B2: =MID(A2,2,LEN(A2)). This function deletes the first character of the county name –– the unwanted period –– automatically.
- Copy and paste the new B2 cell into the rest of the rows to apply the same formula throughout the entire spreadsheet.
- Copy the new period-free column you just created and paste it as unformatted text into another new column to the right by selecting "Paste special>Paste values only." This formats the data in new column as values only so that it they won't include a formula that depends on the incorrectly formatted column to work.
- Delete the first two columns so that the first column now becomes the county names only, free of the preceding periods.
We're almost finished getting the data ready. All we have left to do is add in a column containing the official county codes for each county so that we'll have a common attribute with which we can merge the spreadsheet with the geometric data later. Download this .csv from the Census Bureau, which contains the county codes for all 50 states. Open it as a Google spreadsheet and delete all the rows except for the ones for the state you're visualizing. With only the rows for the state at hand remaining, highlight the "ctyname" column and select "Data>Sort values from A-Z." This should reorder the spreadsheet to be exactly the same alphabetic order as the spreadsheet with the population totals. Copy the "county" column containing the numeric county codes, which should now also be arranged in numeric order, and paste it into a new column in the spreadsheet with the population totals, which should also be sorted alphabetically. This should now give you the correct county codes for each county in a new column. Title that column simply "COUNTY" (all-caps). For an idea of what things should look like at this point, check out my Google spreadsheet here, or see the following screenshot:
One last thing to keep in mind: If your county codes include values less than three digits, which they probably will, make sure any values less than three-digits long have 0s preceding them to force them to be three digits long (i.e. '001'). That way, it will match up with the three digit values in the geometric data later on in the process.
Now that we have our population data ready, let's download the corresponding geometric county polygons from the Census Bureau here. For this tutorial we'll be using the shapefiles (.shp) format, so make sure to select the shp.zip option on the download page. I obviously used Georgia in this example; feel free to choose whatever other state you desire, so long as you follow the exact same instructions. Unzip the archive to your computer. You should see three files in the new unzipped folder: a .dbf, a .shp and a .shx. All we'll be need for this tutorial is the .shp file.
At this point, you should have two different files: a Google spreadsheet of data formatted something like this, and a shapefile of corresponding county polygons. The next step will be to take the population data and bind it to the shapefile so that the two match up.
2. Binding the data to the shapefile using QGis
Fire up QGis. Select the "Add Vector Layer" option from the top of the window and locate the .shp file you downloaded earlier. Open it in QGis and you should see a nice outline of your state that looks something like this:
Now go back to your newly created Google spreadsheet and export the data as a .csv by selecting "File>Download data as." After downloading the .csv, rename it to something simple like "georgia.csv." The next thing we need to do is import the .csv into QGis. But before we can do that, we need to create a new '.csvt' file by the same name and in the same directory as the .csv that will tell QGis what type of data each column is (string, number, real, etc.). For this example, your '.csvt' file will look something like this, with a data type defining each column of the .csv:
If you're having trouble with this part, download my .csvt here. Should you need to adjust the data types, just force open it in TextEdit and change them. The main two things to make sure is that the COUNTY column is defined as a string and the PERCENT column is defined as a value.
Once your .csvt file has been placed in the same directory as your .csv, go back to QGis and add a new vector layer just as you did before with the shapefile. This time, locate the .csv file and open it. QGis should then automatically detect the .csvt file in the background and assign the appropriate data types to each column in the .csv. To make sure this worked correctly, you can control-click the new .csv vector layer and select "Properties>Fields" to check that each field has the appropriate data type.
Now you can get down to the business of binding the data from the .csv to the shapefile. Select the shapefile layer and go to "Properties>Joins." Add a new vector join, setting the .csv as the join layer and both the join field and the target field as COUNTY, like this:
Applying the join should merge your spreadsheet and shapefile, binding the population data to the polygons using the shared "COUNTY"attribute that contains the matching county codes. To check and make sure everything worked correctly, control-click the shapefile vector layer in QGis and select "Open attribute table." You should see the population data attached as columns to the end of the attribute table.
Once you've ensured that the data has been attached to the polygon vector layer, you can now export the new shapefile by control-clicking the shapefile vector layer, selecting "Save as," and saving the layer in the ESRI Shapefile format. Now you're ready to compress the shapefile package into a .zip and import it into Tilemill. where you can style it, add interactivity and more.
3. Styling the map in Tilemill
. Create a new project. Under the layers panel, add a new layer and locate the .zip of the ESRI shapefile you just exported from QGis. Upload the package. Upon doing so, you should immediately see the polygons for your state. Now you'll need to style the map in the style.mss panel using the Carto language. Because the numbers at hand for this map represent either a positive or a negative percent change, it makes sense to create a chloropleth map where red represents negative values and green represents positive values. You might try using ColorBrewer
to find the right color ramp for your data. For this example, I used the following style parameters: