Thursday, 26 June 2014

ggpie: pie graphs in ggplot2

How does one make a pie chart in R using ggplot2? (If you're impatient: see the final code here).

I know, I know, pie charts are often not very good ways of displaying data. It is hard to visually compare the relative sizes of slices (particularly the smaller ones and if they are scattered with larger ones in between), and you get no sense of scale.

However, sometimes a good ol' pie chart can convey your point in a way that many of the general public will find easy to understand. For example, this is how I've spent my "work" hours this week so far:

(I manually inserted the line break into "marking exams" so it wouldn't get cut off - not sure how to automatically do this).

Terrible, I know. But the plot conveys quite clearly the point I want to make: I wasted away almost half (!) of my week so far playing nethack when I should have been doing my PhD, which is much more time than I have spent on my PhD itself!

(Aside: nethack is a most excellent ASCII game that can be obtained here, or from your repositories on most Linux systems. In my defence, during the month of June the Junethack tournament is run and clan overcaffeinated (me) is in a deadly battle with clan demilichens to win the "most unique deaths" trophy. And this is the last week of June).

How to do it

My data looks like this (I've been using hamster time tracker, though I keep forgetting to track things.):

How I've spent my PhD hours this week
activity time
Nethack 15.2
PhD 7.4
Marking exams 4
Meetings 2.3
Lunch 2
Writing this post 1.5

My first attempt at building a pie chart of this data follows the ggplot2 documentation for coord_polar and this excellent post on r-chart. There are also a number of relevant questions on StackOverflow.

First we construct a stacked bar chart, coloured by the activity type (fill=activity). Note that x=1 is a dummy variable purely so that ggplot has an x variable to plot by (we will remove the label later).

(Aside: if x is a factor (e.g. factor("dummy")) for some reason the bar does not get full width, and the resulting pie chart has a funny little hole in the middle).

p <- ggplot(df, aes(x=1, y=time, fill=activity)) +
        geom_bar(stat="identity") +
        ggtitle("How I've spent my PhD hours this week")
print(p)

Then we use coord_polar() to turn it into a pie chart:

p <- p + coord_polar(theta='y')
print(p)

(Aside: does anyone find it weird that although my x axis was 1 and my y axis was time, on the resultant graph the x axis is now 'time' and the y axis is now 1?)

Let's tweak this so it doesn't look so bad.

First, some pure aesthetics: let's outline each slice of the pie in black using color='black' in geom_bar(). This causes there to be an ugly black line and outline on each square of the legend, so we remove that.

# We have to start the plot again because `color='black'` goes into geom_bar
p <- ggplot(df, aes(x=1, y=time, fill=activity)) +
        ggtitle("How I've spent my PhD hours this week") +
        coord_polar(theta='y')
p <- p +
        # black border around pie slices
        geom_bar(stat="identity", color='black') +
        # remove black diagonal line from legend
        guides(fill=guide_legend(override.aes=list(colour=NA)))
print(p)

Now, remove the axes:

  • both the axis labels ("time", "1"),
  • the tick marks on the vertical axis,
  • the tick labels on the vertical axis (0.75, 1.00, 1.25).

Note - for some reason, although 1 was our x variable, you remove its tick label by setting axis.text.y...

p <- p +
    theme(axis.ticks=element_blank(),  # the axis ticks
          axis.title=element_blank(),  # the axis labels
          axis.text.y=element_blank()) # the 0.75, 1.00, 1.25 labels.
print(p)

Now, I should also remove the '0', '5', '10', '15' tick marks/labels, being the cumulative number of hours spent so far, since they're not meaningful.

However, I want to label each slice of the pie, and it is convenient to put my labels in place of the '0', '5', etc.

In terms of their position, they should be located at the midpoint of each pie slice. Think back to the stacked bar chart I produced at the start, and recall that the y axis (time) shows cumulative hours spent.

This means the y coordinate of the end of each slice is given by cumsum(df$time) (cumulative sum of time spent so far), and so the coordinate of the midpoint of each slice is given by:

y.breaks <- cumsum(df$time) - df$time/2
y.breaks
## [1]  7.60 18.90 24.60 27.75 29.90 31.65

In order to implement this in the pie chart, we use scale_y_continuous with the breaks argument being the coordinates we've just calculated, and the labels argument being the activity name.

p <- p +
    # prettiness: make the labels black
    theme(axis.text.x=element_text(color='black')) +
    scale_y_continuous(
        breaks=y.breaks,   # where to place the labels
        labels=df$activity # the labels
    )
print(p)

Altogether now:

p <- ggplot(df, aes(x=1, y=time, fill=activity)) +
        ggtitle("How I've spent my PhD hours this week") +
        # black border around pie slices
        geom_bar(stat="identity", color='black') +
        # remove black diagonal line from legend
        guides(fill=guide_legend(override.aes=list(colour=NA))) +
        # polar coordinates
        coord_polar(theta='y') +
        # label aesthetics
        theme(axis.ticks=element_blank(),  # the axis ticks
              axis.title=element_blank(),  # the axis labels
              axis.text.y=element_blank(), # the 0.75, 1.00, 1.25 labels
              axis.text.x=element_text(color='black')) +
        # pie slice labels
        scale_y_continuous(
            breaks=cumsum(df$time) - df$time/2,
            labels=df$activity
        )

TL;DR

I've tied it together into a function ggpie:

library(ggplot2)
# ggpie: draws a pie chart.
# give it:
# * `dat`: your dataframe
# * `by` {character}: the name of the fill column (factor)
# * `totals` {character}: the name of the column that tracks
#    the time spent per level of `by` (percentages work too).
# returns: a plot object.
ggpie <- function (dat, by, totals) {
    ggplot(dat, aes_string(x=factor(1), y=totals, fill=by)) +
        geom_bar(stat='identity', color='black') +
        guides(fill=guide_legend(override.aes=list(colour=NA))) + # removes black borders from legend
        coord_polar(theta='y') +
        theme(axis.ticks=element_blank(),
            axis.text.y=element_blank(),
            axis.text.x=element_text(colour='black'),
            axis.title=element_blank()) +
    scale_y_continuous(breaks=cumsum(dat[[totals]]) - dat[[totals]] / 2, labels=dat[[by]])    
}

For example:

library(grid) # for `unit`
ggpie(df, by='activity', totals='time') +
    ggtitle("A fun but wasteful week.") +
    theme(axis.ticks.margin=unit(0,"lines"),
          plot.margin=rep(unit(0, "lines"),4))

(Note: for some reason the pie plots have an unreasonably large amount of white space, and the theme(*.margin) settings are an attempt to control that. However, I still get a lot of vertical space that I'm not sure how to compress).

Clearly the labels could do with more work (if they are too long they go out of the plot boundary, or they bump into the pie chart), but not bad for an hour and a half's work :)

17 comments:

  1. Thanks. I made a few modifications to have a blue brewer scale and also to get rid of the background grey and the legend.
    The ggpie is now:
    ggpie <- function (dat, by, totals) {
    ggplot(dat, aes_string(x=factor(1), y=totals, fill=by)) +
    geom_bar(stat='identity', color='black') +
    scale_fill_brewer() +
    guides(fill=guide_legend(override.aes=list(colour=NA))) + # removes black borders from legend
    coord_polar(theta='y') +
    theme(axis.ticks=element_blank(),
    axis.text.y=element_blank(),
    axis.text.x=element_text(colour='black'),
    axis.title=element_blank(),
    legend.position="none") +
    scale_y_continuous(breaks=cumsum(dat[[totals]]) - dat[[totals]] / 2, labels=dat[[by]])
    }

    When called, I add this:
    theme(panel.background = element_rect(fill = "white")) +

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. thank you and this blogger very much, I quite like this ggpie function!

      Delete
  2. thank you and this blogger very much, I quite like this ggpie function!

    ReplyDelete
  3. Awesome!!!!!!!!

    ReplyDelete
  4. Thanks fella.
    Very helpful.

    Jah bless you!

    ReplyDelete
  5. "I wasted away almost half (!) of my week so far playing nethack when I should have been doing my PhD, which is much more time than I have spent on my PhD itself!"

    And for me it was "UFO Enemy Unknown". Hopefully I can catch up a bit because of your post... [=

    ReplyDelete
  6. Version refined to better display small slices/many slices, with optional plot title and palette:

    ggpie <- function (dat, by, totals, main=NA, pal=NA) {
    p = ggplot(dat, aes_string(x=factor(1), y=totals, fill=by)) +
    geom_bar(width=1, stat='identity', color='black') +
    guides(fill=guide_legend(override.aes=list(colour=NA))) + # removes black borders from legend
    coord_polar(theta='y') +
    theme(axis.ticks=element_blank(),
    axis.text.y=element_blank(),
    axis.text.x=element_blank(),
    axis.title=element_blank(),
    panel.grid=element_blank()) +
    scale_y_continuous(breaks=cumsum(dat[[totals]]) - dat[[totals]] / 2, labels=dat[[by]]) +
    theme(panel.background = element_rect(fill = "white"))
    if (!is.na(pal[1])) {
    p = p + scale_fill_manual(values=pal)
    }
    if (!is.na(main)) {
    p = p + ggtitle(main)
    }
    p
    }

    ReplyDelete
  7. Thanks to all for sharing. My own tweak on the subject below, for reference:


    df <- structure(list(id = structure(c(5L, 6L, 7L, 8L, 2L, 3L, 4L, 1L,
    10L, 11L, 9L), .Label = c("Chrome 10.0", "Firefox 3.5", "Firefox 3.6",
    "Firefox 4.0", "MSIE 6.0", "MSIE 7.0", "MSIE 8.0", "MSIE 9.0",
    "Opera 11.x", "Safari 4.0", "Safari 5.0"), class = "factor"),
    variable = structure(c(3L, 3L, 3L, 3L, 2L, 2L, 2L, 1L, 5L,
    5L, 4L), .Label = c("Chrome", "Firefox", "MSIE", "Opera",
    "Safari"), class = "factor"), value = c(10.85, 7.35, 33.06,
    2.81, 1.58, 13.12, 5.43, 9.91, 1.42, 4.55, 1.65)), .Names = c("id",
    "variable", "value"), row.names = c(NA, -11L), class = "data.frame")

    ggpie <- function (data, fill, value, palette = 'Set1', legend = TRUE) {
    require('ggplot2')
    require('RColorBrewer')
    n <- length(unique(data[,fill]))
    pal <- colorRampPalette(brewer.pal(9, palette))
    p <- ggplot(data = data,
    aes_string(x = 1, y = value, fill = fill)) +
    geom_bar(stat = 'identity', color = 'black') +
    coord_polar(theta = 'y') + # make a pie chart with coord_polar()
    # scale_fill_brewer() + # max 9 colors --> use scale_fill_manual
    scale_fill_manual(values = pal(n)) +
    xlab(NULL) + ylab(NULL) +
    theme(axis.ticks = element_blank(),
    axis.text.y = element_blank(),
    axis.text.x = element_blank(),
    axis.title = element_blank(),
    legend.key = element_blank(),
    panel.background = element_rect(fill = 'snow'),
    panel.grid = element_blank(),
    panel.margin = unit(c(0, 0, 0, 0), 'lines'),
    plot.margin = unit(c(0, 0, 0, 0), 'lines'))
    if(!legend) {
    p <- p + theme(legend.position = 'none') +
    theme(axis.text.x = element_text(colour = 'black')) +
    scale_y_continuous(breaks = cumsum(data[value])-data[value]/2,
    labels = data[fill])
    }
    return(p)
    }

    ggpie(data = df, value = 'value', fill = 'id', legend = FALSE)
    ggpie(data = df, value = 'value', fill = 'id', legend = TRUE)

    ReplyDelete
  8. To remove the grey, circular outline (formerly the grid): ``panel.grid = element_blank()``. By setting ``xlab`` and ``ylab`` to NULL, you remove blank space. I'm still struggling to understand how/when to use ``aes_string()`` versus plain ``aes()`` and when to set ``1`` to a factor or not, e.g. ``aes(x = factor(1), ...`` versus ``aes_string(x = 1, ...``

    ReplyDelete
  9. this is one of the few good resources available for creating a pie chart in ggplot2

    ReplyDelete
  10. Thanks for sharing, i couldn't find a solution to labels overlapping problem? No matter what i do labels still overlap. Could you recomend something for this problem?

    ReplyDelete
  11. Thank you. This is the one place online that told me how to create a border around my pie charts.

    ReplyDelete
  12. I've made some minor modifications to change the order of categories and Annotation:

    ggpie <- function (data, id = "id", fill = "fill", value = "value",
    main = "Pie Chart", palette = 'Set1', legend = TRUE) {
    # Create Pie Chart
    #
    # Args:
    # data: data.frame with at least three columns
    # id: group for the pie chart, only support 1 id
    # fill: the categories for the group, factor
    # value: the percentage value for each category
    # main: the title of the plot
    # palette: ColorBrewer
    # legend: whether show the legend, TRUE or FALSE
    #
    # Returns:
    # the ggplot2 object
    #
    # Error handling
    library('ggplot2')
    library('RColorBrewer')
    suppressPackageStartupMessages(library(dplyr))
    stopifnot(all(c(id, fill, value) %in% names(data)))
    idLength <- length(unique(data[, id]))
    stopifnot(idLength == 1)
    ## transform data
    df <- data %>%
    mutate_(id = id,
    fill = fill,
    value = value) %>%
    select(id, fill, value)
    df <- with(df, df[order(fill), ])
    df <- plyr::ddply(df, "id", transform,
    label_y = cumsum(value) - 0.5 * value)
    ## parameters
    n <- length(unique(data[, fill]))
    pal <- colorRampPalette(brewer.pal(9, palette))
    p <- ggplot(df) +
    geom_col(aes(x = 1, y = value, fill = fill),
    position = position_stack(reverse = TRUE),
    color = "white") +
    annotate("text", x = 1.6, y = df$label_y,
    label = df$fill) +
    scale_fill_manual(values = pal(n)) +
    ggtitle(main) +
    coord_polar(theta = 'y') + # make a pie chart with coord_polar()
    # scale_fill_brewer() + # max 9 colors --> use scale_fill_manual
    xlab(NULL) + ylab(NULL) +
    theme(axis.ticks = element_blank(),
    axis.text = element_blank(),
    axis.title = element_blank(),
    legend.key = element_blank(),
    plot.title = element_text(hjust = .5),
    panel.background = element_rect(fill = 'snow'),
    panel.grid = element_blank(),
    panel.spacing = unit(c(0, 0, 0, 0), 'lines'))
    if(!legend) {
    p <- p + theme(legend.position = 'none')
    }
    return(p)
    }

    ReplyDelete
  13. One more possible enhancement: use labs(title = "some title", subtitle = "A subtitle", caption = "Source: blah blah") to offer more information.

    ReplyDelete
  14. Thank you very much for this.

    ReplyDelete