April 2, 2017
A child of five could understand this. Send someone to fetch a child of five.
Simple is never that simple.
My goal with these pages is to learn about d3 and to try the page style defined by tufte-css, which is based on the style of Edward Tufte's books. The Visual Display of Quantitative Information, 1983; Envisioning Information, 1990; Visual Explanations: Images and Quantities, Evidence and Narrative, 1997; and Beautiful Evidence, 2006.
In my day-to-day work I rarely have reason to produce graphics of any interest: time-series, mostly, or simple tables. But I have wanted to learn how to do more interesting visualizations, and to apply some of the Edward Tufte (ET) principles. Recently I came across tufte-css This has also inspired Tufte Handout, an attempt to make ET-style typography and page layout available for web pages. Whether that style carries over to web pages is a question, but it is a style that I like, and would like to try. And that is what got me started on this project.
The tufte-css project provides css styles and fonts that are similar to the textual and page-layout style of Tufte's books, but provides nothing to help with graphics. Tufte Handout does provide such support, using ggplot2, but that's only available in the R environment.
Since I was already interested in d3 I decided that these pages would be a good place to try it out. d3 is a well-designed library (40 or more libraries, really), with a beautifully consistent and effective set of principles.
This page provides examples of the basic features of tufte-css, and some examples of "simple" d3.
Tables can be created using any of the formats supported by pandoc, or, as here, by loading data from an external source and using d3 to generate the table.
The data in this table and the following charts is taken from ftp.ncdc.noaa.gov/pub/data/ghcn/daily/, specifically ghcnd_all.tar.gz. This is a large dataset containing historical weather data from about 100,000 weather reporting stations, covering the period from 1892 to the present. The file ghcnd-stations.txt is an index of station IDs to location.
This is the canonical example of introductory d3, though not necessarily a great way to present data. It is a gentle way of practicing svg, d3 scales, axes, and the "margin convention."
This is mostly cribbed from Let's Make a Barchart. But so are most simple d3 examples, as far as I can tell.
That is a lot of screen real estate to display 12 numbers and labels! Let's see if we can do better.
One way to get a better view of the data is to construct a box plot. For this example we will use the same dataset, but show the distribution of temperatures at the 5th, 25th, 50th, 75th, and 95th percentiles, as well as the absolute minimum and maximum values. The boxes extend from p25 to p75. The lines extend to p05 and p95, and the outlying dots show the minimum and maximum values. You can view exact values for a measurement by hovering over the box.
High and Low Temperature Ranges, Bisbee AZ, 1997-2016
There, that's better. Visually it's not great, but we are able to show a lot more data, and we get a much better idea of the temperature distributions. For example, we can see that July low temperatures are almost always within a narrow range, but have extreme outliers.
Now let's have a look at how temperature has changed, if at all, over the 20 year period. To do that we will show the data for each month separately in a set of 12 small multiples.
This shows the mean, 10th percentile, and 90th percentile temperatures for each month over the 20 year period, in °F.
There are no obvious trends here. February may have gotten a little warmer, and August and September a little cooler. But with just 20 years of data we wouldn't expect to see much change; and we don't.