January 27, 2015

Lesson from emacs: keep data easy to manipulate and easy to present

Spreadsheets

Spreadsheets are evil. And do you know why? Because they're too powerful. Here is the sweet spot of spreadsheets: tables of related data on which calculations need to be made. Here is not the sweet spot of spreadsheets: everything else.

Spreadsheet software is so powerful nowadays that you can make visualizations for almost anything. And some people are happy to spend a really long time creating and maintaing those, but I am WAY, WAY lazier than those people are. I've also learned that there was such a thing as the Single Responsibility Principle.

Single Responsibility Principle

The Single Responsibility Principle, in code, states that a class (or a function, or really any logical set of code) must have one and only one reason to change. This is very strong.

Let's take a spreadsheet which is a visualization of a schedule. This spreadsheet can change if we're going to change the color a particular scheduled item, or the dates of the scheduled item, or who's doing the scheduled thing, or the granularity or the schedule, or something around a scheduled item. Not that I'm counting, but I think that's more than one.

Separation of concerns

Separating your concerns helps with that. Among other things, a single concern is, well, a single thing, so it helps make sure there's just one reason for a set of code to change. When I'm looking at data, I've basically got three concerns:

  1. Storing the data
  2. Manipulating the data
  3. Visualizing the data

If I can find a way of storing the data in such a way that manipulating it is about as painless as it gets, then I am much more likely to want to play with the data, and therefore I am much more likely to come to new realizations about it.

emacs

Emacs is a beast with many uses, but it is definitely organized. And it is, of course, mostly about modifying plain-text, though it has a large support for richer formats. Many of the plain-text modes support exporting and converting to other formats, which is also a driving force behind many of the powerful use cases for emacs.

org-mode

Enter org-mode.

Org-mode is about organizing things. It's basically an outline mode with Getting Things Done features on crack. My favorite org-mode feature to date is how it handles tables, precisely because it is such a wonderful example of the separation of concerns that I'm talking about here. Here's a small example.

Tables

Name Age Date of Birth Income
Joe 30 <1984-02-20 Mon> 84000
Mark 40 <1974-04-15 Mon> 120000
Sophie 44 <1970-08-28 Fri> 150000
Anna 26 <1988-02-29 Mon> 160000

Here is how I wrote this table:

  1. I typed |Name|Age|Date of Birth|Income| and then I pressed
  2. I typed Joe, , , C-c . (Ctrl+C then the period character), which dropped me in emacs' calendar, I wrote a date and pressed enter, then
  3. I typed 84000 then I pressed
  4. I repeated for the next lines
  5. I wrote the last line, a formula line, to calculate how old they are today, and pressed C-u C-c C-c, which is "recalculate everything"

And this is how it looked to me in plain text:


| Name   | Age | Date of Birth    | Income |
|--------+-----+------------------+--------|
| Joe    |  30 | <1984-02-20 Mon> |  84000 |
| Mark   |  40 | <1974-04-15 Mon> | 120000 |
| Sophie |  44 | <1970-08-28 Fri> | 150000 |
| Anna   |  26 | <1988-02-29 Mon> | 160000 |
#+TBLFM: $2=(now()-$3)/365;%d

It is important to note that emacs' org-mode did all the row alignment for me. Honesty dictates that I mention I actually had to ask how to do the calculation here, because I didn't know about the `now()` function.

But let's say I realize that the date of birth is better before the age:

Name Date of Birth Age Income
Joe <1984-02-20 Mon> 30 84000
Mark <1974-04-15 Mon> 40 120000
Sophie <1970-08-28 Fri> 44 150000
Anna <1988-02-29 Mon> 26 160000

Here's how much work this was:

  1. Go to the Age column
  2. Press Alt+Right
  3. Watch as the column moves and the formula is updated, all for me. (note in the example below how the formula changed!)

| Name   | Date of Birth    | Age | Income |
|--------+------------------+-----+--------|
| Joe    | <1984-02-20 Mon> |  30 |  84000 |
| Mark   | <1974-04-15 Mon> |  40 | 120000 |
| Sophie | <1970-08-28 Fri> |  44 | 150000 |
| Anna   | <1988-02-29 Mon> |  26 | 160000 |
#+TBLFM: $3=(now()-$2)/365;%d

And now what if I want the names to be ordered alphabetically, like so?

Name Age Date of Birth Income
Anna 26 <1988-02-29 Mon> 160000
Joe 30 <1984-02-20 Mon> 84000
Mark 40 <1974-04-15 Mon> 120000
Sophie 44 <1970-08-28 Fri> 150000

| Name   | Age | Date of Birth    | Income |
|--------+-----+------------------+--------|
| Anna   |  26 | <1988-02-29 Mon> | 160000 |
| Joe    |  30 | <1984-02-20 Mon> |  84000 |
| Mark   |  40 | <1974-04-15 Mon> | 120000 |
| Sophie |  44 | <1970-08-28 Fri> | 150000 |
#+TBLFM: $2=(now()-$3)/365;%d

Simple enough! There's a function in org-mode called org-table-sort-lines that'll do it for me. And now I notice that when I made up the data for this table, as the names go up in the alphabet, so do the ages go up. Yet, even if I did not know about it, I could have just done the following:

  1. Go to a row
  2. Press Alt-Up or Alt-Down to move it up or down
  3. Repeat until rows are organized as I wish

And do you know how much work it was to transform all this plain text into the HTML you're reading? Barely any, in large part because the blogging engine I use is an emacs package.

The beautiful thing about this is: if I find that I need to reorganize my sections, I can equally simply use Alt-Up or Alt-Down to move them, along with all the text below it, up or down. And then I can re-generate the HTML.

I do the work I want to do in plain-text, because it can be made very easy to manipulate, and then we come up with a powerful visualization. The visualization is only for reading, not for manipulating. A visualization is generated, but otherwise generally static

Other examples of visualization or otherwise human-readable data, unsuited for manipulation

My favorite other example is SQL versus key-value stores. The fairly famous D3.js library is also a good example.

Back to spreadsheets

Use spreadsheets, but only to learn things about your data. Then find a visualization that makes sense, and generate it based on your data. You'll be much happier in the long run. I promise.

© Aldric Giacomoni 2017