Lesson from emacs: keep data easy to manipulate and easy to present
Contents
Spreadsheets
Spreadsheets are evil. And do you know why? Because they're too powerful. Here is the sweet spot of spreadsheets: tables of related data on which calculations need to be made. Here is not the sweet spot of spreadsheets: everything else.
Spreadsheet software is so powerful nowadays that you can make visualizations for almost anything. And some people are happy to spend a really long time creating and maintaing those, but I am WAY, WAY lazier than those people are. I've also learned that there was such a thing as the Single Responsibility Principle.
Single Responsibility Principle
The Single Responsibility Principle, in code, states that a class (or a function, or really any logical set of code) must have one and only one reason to change. This is very strong.
Let's take a spreadsheet which is a visualization of a schedule. This spreadsheet can change if we're going to change the color a particular scheduled item, or the dates of the scheduled item, or who's doing the scheduled thing, or the granularity or the schedule, or something around a scheduled item. Not that I'm counting, but I think that's more than one.
Separation of concerns
Separating your concerns helps with that. Among other things, a single concern is, well, a single thing, so it helps make sure there's just one reason for a set of code to change. When I'm looking at data, I've basically got three concerns:
-
Storing the data
-
Manipulating the data
-
Visualizing the data
If I can find a way of storing the data in such a way that manipulating it is about as painless as it gets, then I am much more likely to want to play with the data, and therefore I am much more likely to come to new realizations about it.
Emacs
Emacs is a beast with many uses, but it is definitely organized. And it is, of course, mostly about modifying plain-text, though it has a large support for richer formats. Many of the plain-text modes support exporting and converting to other formats, which is also a driving force behind many of the powerful use cases for emacs.
org-mode
Enter org-mode. Org-mode is about organizing things. It's basically an outline mode with Getting Things Done features on crack. My favorite org-mode feature to date is how it handles tables, precisely because it is such a wonderful example of the separation of concerns that I'm talking about here. Here's a small example.
Tables
Name | Age | Date of Birth | Income |
---|---|---|---|
Joe | 30 | 84000 | |
Mark | 40 | 120000 | |
Sophie | 44 | 150000 | |
Anna | 26 | 160000 |
Here is how I wrote this table:
-
I typed |Name|Age|Date of Birth|Income| and then I pressed <TAB>
-
I typed Joe, <TAB>, <TAB>, C-c . (Ctrl+C then the period character), which dropped me in emacs' calendar, I wrote a date and pressed enter, then <TAB>
-
I typed 84000 then I pressed <TAB>
-
I repeated for the next lines
-
I wrote the last line, a formula line, to calculate how old they are today, and pressed C-u C-c C-c, which is "recalculate everything"
And this is how it looked to me in plain text:
| Name | Age | Date of Birth | Income | |--------+-----+------------------+--------| | Joe | 30 | <1984-02-20 Mon> | 84000 | | Mark | 40 | <1974-04-15 Mon> | 120000 | | Sophie | 44 | <1970-08-28 Fri> | 150000 | | Anna | 26 | <1988-02-29 Mon> | 160000 | #+TBLFM: $2=(now()-$3)/365;%d
It is important to note that emacs' org-mode did all the row alignment for me. Honesty dictates that I mention I actually had to ask how to do the calculation here, because I didn't know about the `now()` function.
But let's say I realize that the date of birth is better before the age:
Name | Date of Birth | Age | Income |
---|---|---|---|
Joe | 30 | 84000 | |
Mark | 40 | 120000 | |
Sophie | 44 | 150000 | |
Anna | 26 | 160000 |
Here's how much work this was:
-
Go to the Age column
-
Press Alt+Right
-
Watch as the column moves and the formula is updated, all for me. (note in the example below how the formula changed!)
| Name | Date of Birth | Age | Income | |--------+------------------+-----+--------| | Joe | <1984-02-20 Mon> | 30 | 84000 | | Mark | <1974-04-15 Mon> | 40 | 120000 | | Sophie | <1970-08-28 Fri> | 44 | 150000 | | Anna | <1988-02-29 Mon> | 26 | 160000 | #+TBLFM: $3=(now()-$2)/365;%d
And now what if I want the names to be ordered alphabetically, like so?
Name | Age | Date of Birth | Income |
---|---|---|---|
Anna | 26 | 160000 | |
Joe | 30 | 84000 | |
Mark | 40 | 120000 | |
Sophie | 44 | 150000 |
| Name | Age | Date of Birth | Income | |--------+-----+------------------+--------| | Anna | 26 | <1988-02-29 Mon> | 160000 | | Joe | 30 | <1984-02-20 Mon> | 84000 | | Mark | 40 | <1974-04-15 Mon> | 120000 | | Sophie | 44 | <1970-08-28 Fri> | 150000 | #+TBLFM: $2=(now()-$3)/365;%d
Simple enough! There's a function in org-mode called org-table-sort-lines that'll do it for me. And now I notice that when I made up the data for this table, as the names go up in the alphabet, so do the ages go up. Yet, even if I did not know about it, I could have just done the following:
-
Go to a row
-
Press Alt-Up or Alt-Down to move it up or down
-
Repeat until rows are organized as I wish
And do you know how much work it was to transform all this plain text into the HTML you're reading? Barely any, in large part because the blogging engine I use is an emacs package.
The beautiful thing about this is: if I find that I need to reorganize my sections, I can equally simply use Alt-Up or Alt-Down to move them, along with all the text below it, up or down. And then I can re-generate the HTML.
I do the work I want to do in plain-text, because it can be made very easy to manipulate, and then we come up with a powerful visualization. The visualization is only for reading, not for manipulating. A visualization is generated, but otherwise generally static
Other examples of visualization or otherwise human-readable data, unsuited for manipulation
My favorite other example is SQL versus key-value stores. The fairly famous D3.js library is also a good example.
Back to spreadsheets
Use spreadsheets, but only to learn things about your data. Then find a visualization that makes sense, and generate it based on your data. You'll be much happier in the long run. I promise.
Author Aldric Giacomoni
LastMod 2015-01-27