Thursday, 26 June 2014

ggpie: pie graphs in ggplot2

How does one make a pie chart in R using ggplot2? (If you're impatient: see the final code here).

I know, I know, pie charts are often not very good ways of displaying data. It is hard to visually compare the relative sizes of slices (particularly the smaller ones and if they are scattered with larger ones in between), and you get no sense of scale.

However, sometimes a good ol' pie chart can convey your point in a way that many of the general public will find easy to understand. For example, this is how I've spent my "work" hours this week so far:

(I manually inserted the line break into "marking exams" so it wouldn't get cut off - not sure how to automatically do this).

Terrible, I know. But the plot conveys quite clearly the point I want to make: I wasted away almost half (!) of my week so far playing nethack when I should have been doing my PhD, which is much more time than I have spent on my PhD itself!

(Aside: nethack is a most excellent ASCII game that can be obtained here, or from your repositories on most Linux systems. In my defence, during the month of June the Junethack tournament is run and clan overcaffeinated (me) is in a deadly battle with clan demilichens to win the "most unique deaths" trophy. And this is the last week of June).

How to do it

My data looks like this (I've been using hamster time tracker, though I keep forgetting to track things.):

How I've spent my PhD hours this week
activity time
Nethack 15.2
PhD 7.4
Marking exams 4
Meetings 2.3
Lunch 2
Writing this post 1.5

My first attempt at building a pie chart of this data follows the ggplot2 documentation for coord_polar and this excellent post on r-chart. There are also a number of relevant questions on StackOverflow.

First we construct a stacked bar chart, coloured by the activity type (fill=activity). Note that x=1 is a dummy variable purely so that ggplot has an x variable to plot by (we will remove the label later).

(Aside: if x is a factor (e.g. factor("dummy")) for some reason the bar does not get full width, and the resulting pie chart has a funny little hole in the middle).

p <- ggplot(df, aes(x=1, y=time, fill=activity)) +
        geom_bar(stat="identity") +
        ggtitle("How I've spent my PhD hours this week")
print(p)

Then we use coord_polar() to turn it into a pie chart:

p <- p + coord_polar(theta='y')
print(p)

(Aside: does anyone find it weird that although my x axis was 1 and my y axis was time, on the resultant graph the x axis is now 'time' and the y axis is now 1?)

Let's tweak this so it doesn't look so bad.

First, some pure aesthetics: let's outline each slice of the pie in black using color='black' in geom_bar(). This causes there to be an ugly black line and outline on each square of the legend, so we remove that.

# We have to start the plot again because `color='black'` goes into geom_bar
p <- ggplot(df, aes(x=1, y=time, fill=activity)) +
        ggtitle("How I've spent my PhD hours this week") +
        coord_polar(theta='y')
p <- p +
        # black border around pie slices
        geom_bar(stat="identity", color='black') +
        # remove black diagonal line from legend
        guides(fill=guide_legend(override.aes=list(colour=NA)))
print(p)

Now, remove the axes:

  • both the axis labels ("time", "1"),
  • the tick marks on the vertical axis,
  • the tick labels on the vertical axis (0.75, 1.00, 1.25).

Note - for some reason, although 1 was our x variable, you remove its tick label by setting axis.text.y...

p <- p +
    theme(axis.ticks=element_blank(),  # the axis ticks
          axis.title=element_blank(),  # the axis labels
          axis.text.y=element_blank()) # the 0.75, 1.00, 1.25 labels.
print(p)

Now, I should also remove the '0', '5', '10', '15' tick marks/labels, being the cumulative number of hours spent so far, since they're not meaningful.

However, I want to label each slice of the pie, and it is convenient to put my labels in place of the '0', '5', etc.

In terms of their position, they should be located at the midpoint of each pie slice. Think back to the stacked bar chart I produced at the start, and recall that the y axis (time) shows cumulative hours spent.

This means the y coordinate of the end of each slice is given by cumsum(df$time) (cumulative sum of time spent so far), and so the coordinate of the midpoint of each slice is given by:

y.breaks <- cumsum(df$time) - df$time/2
y.breaks
## [1]  7.60 18.90 24.60 27.75 29.90 31.65

In order to implement this in the pie chart, we use scale_y_continuous with the breaks argument being the coordinates we've just calculated, and the labels argument being the activity name.

p <- p +
    # prettiness: make the labels black
    theme(axis.text.x=element_text(color='black')) +
    scale_y_continuous(
        breaks=y.breaks,   # where to place the labels
        labels=df$activity # the labels
    )
print(p)

Altogether now:

p <- ggplot(df, aes(x=1, y=time, fill=activity)) +
        ggtitle("How I've spent my PhD hours this week") +
        # black border around pie slices
        geom_bar(stat="identity", color='black') +
        # remove black diagonal line from legend
        guides(fill=guide_legend(override.aes=list(colour=NA))) +
        # polar coordinates
        coord_polar(theta='y') +
        # label aesthetics
        theme(axis.ticks=element_blank(),  # the axis ticks
              axis.title=element_blank(),  # the axis labels
              axis.text.y=element_blank(), # the 0.75, 1.00, 1.25 labels
              axis.text.x=element_text(color='black')) +
        # pie slice labels
        scale_y_continuous(
            breaks=cumsum(df$time) - df$time/2,
            labels=df$activity
        )

TL;DR

I've tied it together into a function ggpie:

library(ggplot2)
# ggpie: draws a pie chart.
# give it:
# * `dat`: your dataframe
# * `by` {character}: the name of the fill column (factor)
# * `totals` {character}: the name of the column that tracks
#    the time spent per level of `by` (percentages work too).
# returns: a plot object.
ggpie <- function (dat, by, totals) {
    ggplot(dat, aes_string(x=factor(1), y=totals, fill=by)) +
        geom_bar(stat='identity', color='black') +
        guides(fill=guide_legend(override.aes=list(colour=NA))) + # removes black borders from legend
        coord_polar(theta='y') +
        theme(axis.ticks=element_blank(),
            axis.text.y=element_blank(),
            axis.text.x=element_text(colour='black'),
            axis.title=element_blank()) +
    scale_y_continuous(breaks=cumsum(dat[[totals]]) - dat[[totals]] / 2, labels=dat[[by]])    
}

For example:

library(grid) # for `unit`
ggpie(df, by='activity', totals='time') +
    ggtitle("A fun but wasteful week.") +
    theme(axis.ticks.margin=unit(0,"lines"),
          plot.margin=rep(unit(0, "lines"),4))

(Note: for some reason the pie plots have an unreasonably large amount of white space, and the theme(*.margin) settings are an attempt to control that. However, I still get a lot of vertical space that I'm not sure how to compress).

Clearly the labels could do with more work (if they are too long they go out of the plot boundary, or they bump into the pie chart), but not bad for an hour and a half's work :)

Thursday, 12 December 2013

A Prism Syntax Highlighter for R

(Update: description of a known issue, below the numbered list.) As I mentioned in a previous post, I've recently begun to use Lea Verou's fantastic PrismJS to do syntax highlighting on this blog.

As I could not find an R plugin for PrismJS, I wrote one myself. PrismJS highlighting is regex based, which means that while it sometimes makes mistakes with highlighting, it's easy to add in a new language.

You can see a demo here and get the script/style from my gist (I've also embedded the gist at the end of this post). It's similar to Yihui Xie's SyntaxHighlighter brush for R.

Without further ado, here's a demonstration:

n <- 10
for (i in seq_len(n)) {
    # say hello, many times
    message("hello world")
}

You can grab the files from my gist; you can also view an example page demonstrating 3 different themes I modified earlier.

To use it:

  1. Host the the PrismJS and R PrismJS script somewhere and include them (note: you may wish to minify the R script and add it to the end of your PrismJS script so you only need to include one file):

    <script src='prism.js'></script> 
    <script src='prism.r.js'></script>
  2. Include the stylesheet of your choice. You may wish to also define style classes for .token.function, .token.variable and .token.namespace, to determine how functions, variables, and namespaces are styled (the stylesheets on my gist have this already):

    <link rel='stylesheet' href='prism.css' />
  3. Any R code you want to be syntax highlighted should have class="language-r" on it (note: on the code tag, not the pre tag). Works for inline code too.

Et Voila!

Update: There is a known issue whereby hash tags within strings will be highlighted as comments. There are a number of workarounds mentioned in the link, though each has a use that will break it (regex-based highlighters always have these problems).

You can view my gist via bl.ocks.org to see a larger code snippet, and play around with a few themes I modified to make the R look pretty: Okaidia (the one from PrismJS), Zenburn (based off the vim theme by Jami Nurminen), and Tomorrow Night 80s (like the one from RStudio, by Chris Kempson). I'm using Tomorrow Night 80s on my blog.

Sunday, 18 August 2013

Tiny Crochet Despicable Me Minion

After I watched the first Despicable Me, I decided to try make one of the minions for my sister. Now that the second Despicable Me has come out I've been reminded to post my pattern.

At the time I was very much into making tiny little toys, so this pattern is specifically for a small minion - I used 4ply-equivalent cotton to make this minion and he stands 3 - 4cm tall when done. Since he's small (you not only make him in thinner yarn, he has less stitches in him), he relies on a fair bit of embroidery to get all of his features (face, arms, feet,...) as opposed to crochet. I.e., he is made smaller at the expense of detail (he might not look any good in wool or anything big).

If you wish to make a bigger minion or one with more detail/crochet in it, I recommend one of the many patterns out on the net (ravelry is a nice way to collect them all, though they are not all hosted on ravelry). In particular, I like this one from crochet-goods. My pattern is based on this one, just with a greatly reduced number of stitches.

Small minion

slightly too happy minionback of minion

(I was going for "very happy" with the expression but I overshot a little and now he looks like a "special" minion :P)
3-4cm tall (4ply cotton).
Work it in the round without joining.

Body (yellow)
R1: 6sc in magic ring. (6)
R2: inc (2sc in ea sc) around (12)
R3: *inc, sc* around (18)
R4-R12: sc around (18)
[add extra rows if you want a taller minion]
R13: *sc, dec* around (12). Stuff.
R14: dec around (6)
Finish off.

Overalls (blue)
R1: 6sc in magic ring (6)
R2: inc around (12)
R3: *inc, inc, sc* around (20)
R4-R5: sc around (20)
[add extra rows if you have a taller minion]
R6 - 8: 5sc, ch1, turn [this is worked back and forth, not in the round, to make the front of the overalls].

Assembly
Put the overalls on the minion.
Using the blue/overall yarn, sl st into one corner of the overall front panel.
Do a set of chain stitches to be the overall strap and attach to the back.
Repeat on the other side.

Finishing
Embroider a goofy expression onto the minion (but not as goofy as the one I did!).
Stick some small googly eyes on and embroider on the goggles strap and goggles frames (I used silver wire for my goggles frames).
Make the arms using a strand of yarn with a bead on the end.
Use big beads to make the feet.
Embroider the black and blue "G" logo on the front of the overalls and you're done!

Wednesday, 29 May 2013

Searching for a Syntax Highlighter for R

I've recently been looking at various syntax highlighters I could add to my blog to prettify it a bit more. The only criteria was that it had to be easy for me to incorporate into this blog (me being an interwebs ignoramus), and it should support syntax highlighting for R and Javascript, being two languages I post a lot of snippets of.

At a (very) cursory glance, I found a few options:

R-Highlight

To use the highlight package for R:

  • include jQuery, the highlight script and CSS stylesheet;
  • initiate highlighting by using jQuery to select the nodes we wish to highlight and calling r_syntax_highlight();
  • has a list of recognised functions that will be highlighted (no regex there);
  • can even cause each function to link to its documentation (!)

However, the script seems to support the R language only (the highlight package when used from R appears to support many languages, but the script for webpages seems to only expose R syntax highlighting. I could be wrong here).

Syntax Highlighter

To use SyntaxHighlighter:

  • include the script shCore.js and stylesheets shCore.css, shThemeDefault.css, plus the script for each language you wish to syntax highlight;
  • initiate highlighting by labelling <pre> tags to be highlighted with class: "brush: <language>", and add a call to SyntaxHighlighter.all();
  • does not support R out of the box, but Yihui Xie has written a language definition for it (regex-based).

PrismJS

To use PrismJS:

  • include the script prism.js and stylesheet prism.css;
  • mark code to be highlighted using <code class="language-<language>">;
  • no need to call any functions to initiate highlighting;
  • does not support R out of the box, but languages can be defined using regexes.

Conclusion

They all seem pretty awesome, and I absolutely love R-highlight's ability to link to a function's documentation as well as marking it up. All three are fairly easily themeable. I strongly recommend you check them all out.

However, in the end I went with PrismJS, because

  • I couldn't work out how to use R-highlight for languages other than R (I often blog with Javascript snippets).
  • SyntaxHighlighter, while very popular, required me to host and link to many Javascript files (one per language). I didn't feel like doing this.
  • I love how PrismJS requires the class="language-X" part to be in the code tag, not the pre tag. A language is an attribute of code, not of a preformatted block, and as such you should mark a code block's language in the code tag, not the pre tag. Plus, this way of hinting the code language is recommended in the HTML5 specification.
  • PrismJS requires me to include just two files; the script and stylesheet. In addition, it's tiny! With Javascript, R, HTML/XML, CSS and Bash highlighting support the file is all of 7.8kB. <3

I wrote a syntax highlighter for R and PrismJS; I'll post it tomorrow (or whenever I get round to it). Here's a sneak-preview:

# iterate a dis/like of green eggs and ham
helloSam <- function (times=10, like=F) {
    str <- paste('I', ifelse(like, 'like', 'do not like'), 'green eggs and ham!')
    for (i in seq_len(times)) {
        message(str)
    }
}

Sunday, 26 May 2013

R gotcha - regular expressions

Just a quick post --- I came across this today and thought it was worth mentioning.

By default, the regular expression functions grep, gsub, regexpr, etc use extended regular expressions. By passing in perl=TRUE as an argument, one can use Perl regular expressions.

Note that in extended regular expressions, the . character matches the newline character '\n'. In Perl regular expressions, it doesn't.

grep('.', '\n')
## [1] 1
grep('.', '\n', perl=T)
## integer(0)

Something to keep in mind if you use regular expressions in R with strings with embedded newlines and were having puzzling results.

Wednesday, 17 April 2013

Installing RStudio Server on Fedora 18

I'm trying to install RStudio Server on Fedora 18 (64bit) but keep getting an error about libcrypto.so.6 and libssl.so.6. (I didn't enable EPEL as I am not on RedHat/CentOS).

Following the installation instructions, I:

wget http://download2.rstudio.org/rstudio-server-0.97.336-x86_64.rpm
sudo yum install --nogpgcheck rstudio-server-0.97.336-x86_64.rpm

But I get the error:

Error: Package: rstudio-server-0.97.336-1.x86_64 (/rstudio-server-0.97.336-x86_64)
           Requires: libcrypto.so.6()(64bit)
Error: Package: rstudio-server-0.97.336-1.x86_64 (/rstudio-server-0.97.336-x86_64)
           Requires: libssl.so.6()(64bit)

If I look in /usr/lib64 I have (both from the openssl-libs package: yum provides /usr/lib64/libssl.so.10):

/usr/lib64/libcrypto.so.10
/usr/lib64/libssl.so.10

So it appears my package versions are too new for RStudio Server (!).

I decided to try the hacky solution of making some links from the old version to the new (hoping that they are backwards-compatible):

sudo ln -s /usr/lib64/libcrypto.so.10 /usr/lib64/libcrypto.so.6
sudo ln -s /usr/lib64/libssl.so.10 /usr/lib64/libssl.so.6

Now I attempt to install again, and use rpm --nodeps instead of yum to force installation:

sudo rpm -ivh --nodeps rstudio-server-0.97.336-x86_64.rpm

It works!

Preparing...                          ################################# [100%]
Updating / installing...
   1:rstudio-server-0.97.336-1        ################################# [100%]
rsession: no process found
Stopping rstudio-server (via systemctl):                   [  OK  ]
Starting rstudio-server (via systemctl):                   [  OK  ]

(Note - if it's still complaining to you about not finding libraries, try

sudo ldd /usr/lib/rstudio-server/bin/rserver | grep 'not found'

You might get something like:

libssl.so.6 => not found
libcrypto.so.6 => not found

This will tell you which libraries you need to make links for.)

I can test whether it worked OK by opening a browser and pointing it to http://127.0.0.1:8787:

Huzzah! RStudio Server!

To enable external access I had to open port 8787 for my firewall (you could use the firewall applet for this instead of command-line)

iptables -A INPUT -p tcp --dport 8787 -j ACCEPT

Now I can continue with the rest of the instructions! Yay!

Tuesday, 16 April 2013

Getting RStudio to include your `R_LIBS_USER` in its library paths.

I recently installed RStudio Desktop version on my Linux computer, as I wanted to test it out. Previously I'd been happily using R from the command-line.

However, it kept complaining that I didn't have the knitr package installed.

Starting R from the command-line, I found that I did have knitr installed - library(knitr) worked fine! But library(knitr) from RStudio gave the errror:

Error in library(knitr) : there is no package called ‘knitr’

Upon further inspection, I realised that my package library paths were between R-command-line and RStudio, despite the R executable being identical between the two.

My usual R library for command-line R is ~/R/library, and this was in the .libPaths() as expected (executed from R from the command line).

However, when I executed .libPaths() from RStudio, I got:

[1] "/usr/local/lib/R/site-library" "/usr/lib/R/site-library"
[3] "/usr/lib/R/library"            "/usr/lib/rstudio/R/library"

So why did the same R binary produce different library paths?

It turns out that I set my default path ~/R/library by setting my R_LIBS_USER environment variable in my .bashrc file:

export R_LIBS_USER=$HOME/R/library

but RStudio was not reading my .bashrc file when it started up (makes sense I guess, as it's not running from the terminal).

The solution was to create a file ~/.Renviron and set the R_LIBS_USER variable there. R looks for this file upon starting up to set environment variables (see also ?Startup):

R_LIBS_USER=~/R/library

(Note - I could also just do .libPaths(c('~/R/library', .libPaths()) in an .Rprofile file, but I don't use those in general).

Now I start up RStudio and hey presto! It all works.

Thanks to the folk at StackOverflow, in particular flodel and Dirk EddelBuettel, who helped me work this out (the question has probably been deleted since then as it was too localized and probably should have been asked at the RStudio support page).