QGIS is an advanced desktop mapping tool, comparable to ArcGIS. Its advantages: it is a strong tool for geospatial analysis; it’s easy to export your work to static images (PDF, TIFF, etc.). QGIS is less well suited for making interactive web maps, although there are various plugins that facilitate this work. The downside of QGIS is that the learning curve can be steep.
In the hands-on portion of this workshop, we will use census data from American Factfinder and boundaries from New Jersey Geographic Open Data to create a thematic map. We will go through the steps of a simple workflow to visualize percentages of languages spoken at home by New Jersey county.
QGIS uses map layers. You might initially find it surprising that QGIS does not start with a base layer, like Google Maps. QGIS expects you to add every layer for yourself.
You should navigate through American Factfinder in order to download data for yourself. Go to
https://factfinder.census.gov, click on ‘Advanced Search’ > ‘SHOW ME ALL’.
Click the Geography button, and select geographic type ‘county’, state ‘New Jersey’, then ‘All counties within New Jersey’, and click ‘ADD TO YOUR SELECTIONS’.
Next, click on the Topics button, and select ‘People’ > ‘Language’ > ‘Language spoken at home’.
You should now see a table with American Community Survey (ACS) results. Select the row with the ID of B16001 (‘LANGUAGE SPOKEN AT HOME BY ABILITY TO SPEAK ENGLISH FOR THE POPULATION 5 YEARS AND OVER’).
Download the data for 2015. Be sure to click the radio button next to ‘Use the data’ and uncheck the box next to ‘Merge the annotations and data into a single file?’ (we want data and annotations to be in separate files). When your file is complete, click the download button.
Next, go to http://njogis-newjersey.opendata.arcgis.com/datasets/new-jersey-counties and download the NJ county boundaries as a shapefile.
Once you’ve downloaded both datasets, you should unzip them into a directory on your Desktop (right click and select ‘Extract all’).
ACS_15_5YR_B16001_metadata.csvwith Excel. The U.S. Census uses codes to denote various languages and concepts. Note that
ACS_15_5YR_B16001.csvhas two header rows. Ordinarily this might pose a problem for data manipulation, but QGIS has a feature that allows us to ignore one header row.
To help familiarize ourselves with the ACS data, let’s take a look at a subset. The following table provides an overview of the top ~14 languages spoken at home in Middlesex County by population.
|HD01_VD02||Estimate; Total: – Speak only English||448987|
|HD01_VD03||Estimate; Total: – Spanish or Spanish Creole:||123910|
|HD01_VD54||Estimate; Total: – Hindi:||25035|
|HD01_VD51||Estimate; Total: – Gujarati:||24425|
|HD01_VD66||Estimate; Total: – Chinese:||21408|
|HD01_VD93||Estimate; Total: – Tagalog:||10736|
|HD01_VD57||Estimate; Total: – Urdu:||9151|
|HD01_VD108||Estimate; Total: – Arabic:||8568|
|HD01_VD36||Estimate; Total: – Polish:||7925|
|HD01_VD33||Estimate; Total: – Russian:||7313|
|HD01_VD15||Estimate; Total: – Portuguese or Portuguese Creole:||7028|
|HD01_VD72||Estimate; Total: – Korean:||5910|
|HD01_VD12||Estimate; Total: – Italian:||4921|
|HD01_VD87||Estimate; Total: – Vietnamese:||3510|
Each data set can be added to QGIS as a layer. The type of layer will depend on the kind of data that you are using. We will begin by adding the county boundaries for New Jersey. Since these are in a shapefile, they are a vector layer.
To add the layer, click “Layer > Add Layer > Add Vector Layer.” You will then navigate to the directory where the New_Jersey_Counties shapefile is stored. You only need to load the
.shp file. Even though the shapefile actually has several different files associated with it, you add it as a file and not as a directory. QGIS automatically imports the associated files.
After loading the shapefile, it will be displayed as a layer in QGIS.
There are a number of things you can do now that QGIS is displaying data. You can pan and zoom using the tools in the tool bar. It is particularly useful to right click on the layer in the list of layers to the left and choose “Zoom to layer.” Using the “Identify Features” tool under the View menu, you can click on individual counties and see the data associated with them.
We are going to do two things: inspect the data associated with the layer, and change how the layer appears on the map based on the data.
First, to see the data associated with the shapefile, you can right click on the layer and choose “Open Attribute Table.” A window will pop up with a table.
Notice the way that this data is structured. There is one row for each “feature” in the shapefile, in this case, one row for each county. Each county is associated with a set of variables, and each variable is stored in a column. These variables can have a variety of kinds of information associated with them: in this case there are text (i.e., string) fields as well as numeric fields. The kinds of information are important too. Some of the fields are place names, such as
COUNTY_LAB for the county name; others are geographic information, like
SQ_MILES for the area of the county in square miles. Yet others seem cryptic, such as
FIPSSTCO, but these fields provide an unique identifier that lets us connect this spatial data to other kinds of data. This table also contains the population and population density of the county from various years.
Second, let’s change the way that the map is displayed. Instead of displaying the map based on the random colors assigned it by QGIS, we will assign the colors on the map to data in the attributes. You can do this by right clicking on the layer in the browser, then clicking “properties” and the “style” tab. QGIS calls the way that data is displayed its “symbol.” The symbol is normally the same for each feature in a layer. For instance, if all we wanted was to change the color and border of the boundaries, we could do so by selecting the “single symbol” option. This would be appropriate if we were only interested in the boundaries, or if we had, say, one shapefile for schools, another for churches, and so on, and wanted to represent each by a different symbol. In our case we want to pick the “graduated symbol” option, meaning that we are going to assign each feature to a bin associated with a color. By selecting the column to be the
POP2010, we are saying that the color should be determined by that variable. The number of “classes” is the number of bins. The “mode” is the way we determine what the boundaries of the bins should be. In this case we will use the Jenks natural breaks algorithm which tries to make each bin as distinct as possible from every other bin, while making the items in each bin as much alike as possible. There are a number of color ramps to choose from, most taken from the Color Brewer palettes. Clicking “classify” and “ok” assigns our counties to bins.
You should now see a map of the counties classified by their population bins.
You might also want to turn on labels for the shapefile. Right click on the layer, select “Properties > Labels”. In the top drop down menu, select “Show labels for this layer”. In the second drop down menu, select “COUNTY_LAB” as the field that contains your labels. Click “OK”.
But we want to create a map of more interesting information than the population of counties. To do that we need to use the data that we downloaded from American Factfinder as a CSV.
Click back to
ACS_15_5YR_B16001.csv in Excel. The first thing to notice about this file is that the second column, GEO.id2, contains the same kind of code that we saw in the attributes table of our shapefile (column FIPSSTCO). The FIPS code identifies each county in the state. 1 This is the key to joining the data in our spatial data (the shapefile) to our census data (the CSV file). Recall that to learn what the various language codes mean, we have to return to the metadata file that is associated with our data file. The metadata file is contained in
ACS_15_5YR_B16001_metadata.csv. Now we are ready to join the data.
To join the data, we need to load the CSV into QGIS. We can do this by choosing the “Layer > Add Layer > Add Delimited Text Layer” menu option, then navigating to our file. If the file had spatial information (e.g., latitude and longitude) we could let QGIS know where to find it. But this file has no spatial information so we will select “no geometry.” Make sure to discard 1 header row, and to check the box next to “first record has field names.”
Now we have both layers in QGIS and need to join them together. To do that, we will right click on the original shapefile layer, select “Properties,” and click the “Joins” tab. Earlier we noticed that both the shapefile and the CSV file had a field for the FIPS code. We need to specify which layer we are joining, then let QGIS know that the id2 (in ACS_15_5YR_B16001) and FIPSSTCO (New_Jersey_Counties) fields contain the same values.
Once we have completed the join, we can reopen the shapefile’s attribute table. Where before we only had geographic information, now we have access to all of the columns that were in the CSV file. We can create a graduated symbol as we did above. In this case, we can use column 130
ACS_15_5YR_B16001_Estimate; Total: - Hindi:, which gives us the number of Hindi speakers by county.
Let’s undo our graduated map by going to “Properties > Style” and changing our map back to a “Single Symbol” map. Next, let’s create an exploratory pie chart map, making a small (3-4) selection of languages to display in charts over our counties. You might select languages from the above table, for example:
- Estimate; Total: – Hindi:
- Estimate; Total: – Polish:
- Estimate; Total: – Portuguese or Portuguese Creole:
Right click on the shapefile layer, and select “Properties > Diagrams”. Check the box next to “Show diagrams for this layer.” Under “Available attributes”, scroll to select the desired languages and press the plus (+) sign to add them to the pie chart. Double click on the colors to modify them. Click “Ok”.
This kind of visualization may help you to decide which language you might want to explore in greater detail.
Let’s undo our pie chart by going to “Properties > Diagrams” and unchecking the box next to “Show diagrams for this layer.” For population data, it may be preferable to normalize the data by using a ratio or percentage, rather than visualizing raw counts. Ratios help to keep our data proportional. For example:
percent Hindi speakers = population of Hindi speakers / total population * 100.
Go to “Properties > Style”, and change the visualization type from a single symbol to a graduated map. Next to Column, instead of using the drop down menu, we will enter an expression. Click on the epsilon symbol and enter the following expression. 2
"ACS_15_5YR_B16001_Estimate; Total: - Hindi:" / "ACS_15_5YR_B16001_Estimate; Total:" * 100
Choose the Equal Interval mode. This method sets the value ranges in each category equal in size. Click “Classify” and “OK”.
What criticisms would you make of the maps you have created? What are their strengths and weaknesses?
Will you potentially be able to use anything covered today in your work? What didn’t we cover that you were hoping to learn?
We likely won’t get to create a print-ready map during today’s workshop, but Maala Jhagra has prepared a short and to-the-point YouTube video on how to export high resolution maps with legends and scale bars. See https://youtu.be/OtVpOzLA_NM
We did not cover georeferencing of historic maps in today’s workshop, but Daniel McGlone gave a very detailed tutorial on this technique a few years ago. Go to https://drive.google.com/open?id=0B_Z0NuUvMPx4OUZ5cmVpN21MM1U&authuser=0 and follow the Georeferencing tutorial.
Bodenhamer, D. J., Corrigan, J. and Harris, T. M. 2010. The Spatial Humanities: GIS and the Future of Humanities Scholarship. Bloomington: Indiana University Press.
Guldi, J. 2011. “What Is the Spatial Turn?” Spatial Humanities: A Project of the Institute for Enabling Geospatial Scholarship.
Mullen, L. 2015. “Spatial Humanities Workshop”.*
* This handout borrows a great deal from Lincoln Mullen’s workshop.
- For more information on Census IDs, see https://www.census.gov/geo/reference/geoidentifiers.html. ?
- Note that if you copy/paste from the handout, there will be a line break in the middle of the expression. Remove it, or else the expression won’t work! ?