Data Visualization (or "Infoporn" as I like to call it) has been a passion of mine for many years. Most of my career as both a developer and manager has been in the development of software that visualizes large sets of data. For the most part, my work has been around energy industry data but I'm often up late into the night tinkering with data sets I find online.
Over the last couple of years, the visualization of data has taken off and become much more popular than in the past. What used to be the exclusive domain of formal textbooks and students in specialized design programs has become accessible to a wider audience. As I think about it, I suspect the reason for this growth in popularity is the convergence of several factors:
- There is a TON of public data available online. Over the years, I've collected a variety interesting large public data sets, such as AOL search data, Enron email messages, and Netflix movie ratings. Peruse the "publicdata" tag on del.icio.us and you'll find more data than you can shake a chart at. In addition, the popularity of web services and public APIs for data has exploded in the last couple of years. These are ideal for fetching current, dynamic data including weather, stock prices, and other financial data. There are also web sites that catalog the wide variety of web service APIs available online. The popularity of online "mashups" (the combining of two or more web services to create something completely new) has grown very quickly, particularly with the arrival of online mapping services like Google Maps and Virtual Earth. These days, popular web sites that don't provide an API for programmatic access quickly catch heat for their omission.
- Data has become "social" -- though in a "Web 2.0" world, what hasn't? Seriously, there have been some great "social data" sites cropping up over the last couple of years. These sites let anyone upload, visualize, browse, and share their data. Don't like the way some data on these sites is represented? Chart it yourself. The hallmark examples here are Swivel (blog) and Many Eyes (from IBM, also with a blog), though there are other similar sites as well.
- Visualization tools have become more commonplace. In addition to Microsoft improving the charting tools in each new version of Excel, nearly every programming language out there has 3rd party graphics and charting libraries available for it. For many developers, adding basic charting capability to an application has become a fairly simple, plug-and-play affair. That said, it's still too easy to create charts that are ugly and do a poor job of communicating information. In the same way that the rise of desktop publishing tools in the 80's and 90's made for a lot of horrible newsletters and brochures, the increasing number of charting and visualization tools means we're seeing a lot of really bad data presentations. Go ask Edward Tufte (a "founding father" for modern data visualization) about PowerPoint or Stephen Few about BusinessObjects to see what I mean (Few refers to the charts from one Business Objects product as "data visualization Happy Meals" -- not a compliment). Still... it's an exciting time right now for this field.
- Development tools have improved greatly in their handling of data. Most development platforms/environments have some sort of abstraction layer or available data-access tools to easy the querying and manipulation of data. For dealing with local data, it's rare to have to write new code from scratch to ingest and parse data -- most tools have libraries for standard formats like XML or CSV, as well as straightforward APIs for working with relational databases. For remote data, there are lots of tools that quickly generate a local proxy or wrapper around standard web services.
- The development tools for creating and manipulating graphics have similarly improved. Writing code to create on-screen graphics used to be something that an elite few programmers could do -- it typically required very strong C++ skills, in-depth knowledge of complex graphics libraries, and a background in physics and 3D modeling. Now, most modern platforms have relatively approachable APIs for drawing points, lines, regions, and text on screen - as well as simplified APIs for 3D manipulation.
- Also on the graphics front, there's Processing - a development environment designed and developed specifically for visualization. It's built on top of Java, but its creators (Ben Fry and Casey Reas) and collaborators have done a great job of balancing approachability (for designers or those new to programming) and power (for those who want to create advanced, interactive visualizations). If you're interested in checking out Processing (which is free and open source and a lot of fun and so you totally should), I'd recommend Fry's book, "Visualizing Data" (published last year by O'Reilly)... Jeff Atwood calls Fry "Edward Tufte armed with a compiler" and I've found the book to be an excellent walkthrough for Processing. Additionally, it's good introduction to the thought process involved with creating an effective visualization.
- Computing power and storage are cheap and plentiful. It takes a lot of processor cycles to render graphics and a lot of storage space to keep all that data. Thankfully, even a "low-end" machine these days has a ridiculous amount of processing power and 250GB hard drives are a common starting point for hard drive sizes. I recently purchased a 750GB drive for my Windows Home Server machine and its cost was roughly $.20 per gigabyte. While marveling about that the other day, it occurred to me that my very first hard drive (a 10MB noisy beast given to me in the late 80s by a generous uncle) would be insufficient to hold even ONE raw photo from my new camera (a 12-megapixel Nikon D300). Insane. Thank you Mr. Moore and Mr. Kryder.
Given all of the above, it's a great time to be a data geek. Even if you're not interested in designing visualizations of your own, there are lots of blogs and sites that catalog the best infoporn from across the web. It's amazing to see so many projects coming out that are both informative and aesthetically pleasing. The thumbnail below is an example from this week - it's essentially an interactive "area chart over a timeline" showing the Box Office Receipts for movies from 1986 to 2007, designed and built by the New York Times data visualization team (they've been doing some amazing stuff recently).
In addition to checking out my del.icio.us "infoporn" links, you might want to look over some of the feeds I've subscribed to:
In coming posts, I'll link to some of examples of visualizations that I find to be the most impressive, informative, and even humorous.