Big Data and Graphical Representation

Big Data is expected to play an important role in the Smart Grid. As we have more devices monitoring and collecting information regarding our energy consumption behavior, the amount of data will increase. This data can provide lots of insights. When we present this information to consumers it matters not only what we present but also how we show that information.

One of the classic books regarding graphical data representations is Edward Tufte’s ‘The Visual Display of Quantitative Information. After a long time, I finally managed to knock this book off my reading-list. The best insight in the book is one of Tufte’s Principles of Graphical Excellence:

“Give the viewer the greatest number of ideas in the shortest amount of time using the least ink and space”

This one sentence captures the essence of the book. We can even derive his other principles using the above principle as a sort of ‘fundamental rule’. In order to give the viewer ideas, the data has to be meaningful (have substance) and must be in support of a theory. Well-designed graphics attract our attention and compel us to contemplate on the data.

A lot of graphics that we see in popular media is dumbed down. The writers assume that the readers are not capable of understanding statistics and math and attempt to over-simplify things. This leads to erroneous graphics that mislead people. Tufte has devoted an entire chapter on ‘Graphical Integrity’. Once you read this book, you will never look at graphics in the general media without double-checking them for accuracy. Tufte presents plenty of examples of how even well-intentioned graphics can mislead us and present us with wrong information.

How do you decide if a graphic is good? Is it possible to measure graphics for excellence? Tufte describes the concept of ‘data-ink’ to distinguish between good and bad graphics. The concept is very simple and is again related to what I called his fundamental rule. In order to use the least ink, the graphic must try to utilize a greater percentage of the total ink used on the graphic to show data. In other words, the graphic must eliminate all non-data related ink on the graphic. In today’s digital publishing world, ‘ink’ might sound outdated, but the concept is certainly relevant for producing great graphics. Tufte walks through many examples and shows how graphics improve by maximizing the data-ink ratio.

On a more contemporary note, if you happen to read Tufte’s book, I also highly recommend that you check out Bret Victor’s ‘Magic Ink’. While Tufte’s work is concerned with the general graphical representation of data, Victor’s work is concerned specifically about developing good graphical user interfaces in software programs. The main takeaway from Victor’s paper is his classification of software programs. People use software programs for essentially three tasks: to Learn, to Create and to Communicate. Understanding the main reason why customers use your software can guide you to build better graphical interfaces.

In today’s world, Big data, or any data for that matter, will ultimately be condensed into user-understandable format in software. Therefore it is essential that product developers have a good sense of the issues involved in presenting data in software programs. Tufte and Victor present a great starting point to understand these issues.

This entry was posted in Uncategorized and tagged , , , . Bookmark the permalink.

Leave a comment