Mathematical Coffee: R

Showing posts with label R. Show all posts

Thursday, 26 June 2014

ggpie: pie graphs in ggplot2

How does one make a pie chart in R using ggplot2? (If you're impatient: see the final code here).

I know, I know, pie charts are often not very good ways of displaying data. It is hard to visually compare the relative sizes of slices (particularly the smaller ones and if they are scattered with larger ones in between), and you get no sense of scale.

However, sometimes a good ol' pie chart can convey your point in a way that many of the general public will find easy to understand. For example, this is how I've spent my "work" hours this week so far:

(I manually inserted the line break into "marking exams" so it wouldn't get cut off - not sure how to automatically do this).

Terrible, I know. But the plot conveys quite clearly the point I want to make: I wasted away almost half (!) of my week so far playing nethack when I should have been doing my PhD, which is much more time than I have spent on my PhD itself!

(Aside: nethack is a most excellent ASCII game that can be obtained here, or from your repositories on most Linux systems. In my defence, during the month of June the Junethack tournament is run and clan overcaffeinated (me) is in a deadly battle with clan demilichens to win the "most unique deaths" trophy. And this is the last week of June).

How to do it

My data looks like this (I've been using hamster time tracker, though I keep forgetting to track things.):

How I've spent my PhD hours this week
activity	time
Nethack	15.2
PhD	7.4
Marking exams	4
Meetings	2.3
Lunch	2
Writing this post	1.5

My first attempt at building a pie chart of this data follows the ggplot2 documentation for coord_polar and this excellent post on r-chart. There are also a number of relevant questions on StackOverflow.

First we construct a stacked bar chart, coloured by the activity type (fill=activity). Note that x=1 is a dummy variable purely so that ggplot has an x variable to plot by (we will remove the label later).

(Aside: if x is a factor (e.g. factor("dummy")) for some reason the bar does not get full width, and the resulting pie chart has a funny little hole in the middle).

p <- ggplot(df, aes(x=1, y=time, fill=activity)) +
        geom_bar(stat="identity") +
        ggtitle("How I've spent my PhD hours this week")
print(p)

Then we use coord_polar() to turn it into a pie chart:

p <- p + coord_polar(theta='y')
print(p)

(Aside: does anyone find it weird that although my x axis was 1 and my y axis was time, on the resultant graph the x axis is now 'time' and the y axis is now 1?)

Let's tweak this so it doesn't look so bad.

First, some pure aesthetics: let's outline each slice of the pie in black using color='black' in geom_bar(). This causes there to be an ugly black line and outline on each square of the legend, so we remove that.

# We have to start the plot again because `color='black'` goes into geom_bar
p <- ggplot(df, aes(x=1, y=time, fill=activity)) +
        ggtitle("How I've spent my PhD hours this week") +
        coord_polar(theta='y')
p <- p +
        # black border around pie slices
        geom_bar(stat="identity", color='black') +
        # remove black diagonal line from legend
        guides(fill=guide_legend(override.aes=list(colour=NA)))
print(p)

Now, remove the axes:

both the axis labels ("time", "1"),
the tick marks on the vertical axis,
the tick labels on the vertical axis (0.75, 1.00, 1.25).

Note - for some reason, although 1 was our x variable, you remove its tick label by setting axis.text.y...

p <- p +
    theme(axis.ticks=element_blank(),  # the axis ticks
          axis.title=element_blank(),  # the axis labels
          axis.text.y=element_blank()) # the 0.75, 1.00, 1.25 labels.
print(p)

Now, I should also remove the '0', '5', '10', '15' tick marks/labels, being the cumulative number of hours spent so far, since they're not meaningful.

However, I want to label each slice of the pie, and it is convenient to put my labels in place of the '0', '5', etc.

In terms of their position, they should be located at the midpoint of each pie slice. Think back to the stacked bar chart I produced at the start, and recall that the y axis (time) shows cumulative hours spent.

This means the y coordinate of the end of each slice is given by cumsum(df$time) (cumulative sum of time spent so far), and so the coordinate of the midpoint of each slice is given by:

y.breaks <- cumsum(df$time) - df$time/2
y.breaks

## [1]  7.60 18.90 24.60 27.75 29.90 31.65

In order to implement this in the pie chart, we use scale_y_continuous with the breaks argument being the coordinates we've just calculated, and the labels argument being the activity name.

p <- p +
    # prettiness: make the labels black
    theme(axis.text.x=element_text(color='black')) +
    scale_y_continuous(
        breaks=y.breaks,   # where to place the labels
        labels=df$activity # the labels
    )
print(p)

Altogether now:

p <- ggplot(df, aes(x=1, y=time, fill=activity)) +
        ggtitle("How I've spent my PhD hours this week") +
        # black border around pie slices
        geom_bar(stat="identity", color='black') +
        # remove black diagonal line from legend
        guides(fill=guide_legend(override.aes=list(colour=NA))) +
        # polar coordinates
        coord_polar(theta='y') +
        # label aesthetics
        theme(axis.ticks=element_blank(),  # the axis ticks
              axis.title=element_blank(),  # the axis labels
              axis.text.y=element_blank(), # the 0.75, 1.00, 1.25 labels
              axis.text.x=element_text(color='black')) +
        # pie slice labels
        scale_y_continuous(
            breaks=cumsum(df$time) - df$time/2,
            labels=df$activity
        )

TL;DR

I've tied it together into a function ggpie:

library(ggplot2)
# ggpie: draws a pie chart.
# give it:
# * `dat`: your dataframe
# * `by` {character}: the name of the fill column (factor)
# * `totals` {character}: the name of the column that tracks
#    the time spent per level of `by` (percentages work too).
# returns: a plot object.
ggpie <- function (dat, by, totals) {
    ggplot(dat, aes_string(x=factor(1), y=totals, fill=by)) +
        geom_bar(stat='identity', color='black') +
        guides(fill=guide_legend(override.aes=list(colour=NA))) + # removes black borders from legend
        coord_polar(theta='y') +
        theme(axis.ticks=element_blank(),
            axis.text.y=element_blank(),
            axis.text.x=element_text(colour='black'),
            axis.title=element_blank()) +
    scale_y_continuous(breaks=cumsum(dat[[totals]]) - dat[[totals]] / 2, labels=dat[[by]])    
}

For example:

library(grid) # for `unit`
ggpie(df, by='activity', totals='time') +
    ggtitle("A fun but wasteful week.") +
    theme(axis.ticks.margin=unit(0,"lines"),
          plot.margin=rep(unit(0, "lines"),4))

(Note: for some reason the pie plots have an unreasonably large amount of white space, and the theme(*.margin) settings are an attempt to control that. However, I still get a lot of vertical space that I'm not sure how to compress).

Clearly the labels could do with more work (if they are too long they go out of the plot boundary, or they bump into the pie chart), but not bad for an hour and a half's work :)

Thursday, 12 December 2013

A Prism Syntax Highlighter for R

(Update: description of a known issue, below the numbered list.) As I mentioned in a previous post, I've recently begun to use Lea Verou's fantastic PrismJS to do syntax highlighting on this blog.

As I could not find an R plugin for PrismJS, I wrote one myself. PrismJS highlighting is regex based, which means that while it sometimes makes mistakes with highlighting, it's easy to add in a new language.

You can see a demo here and get the script/style from my gist (I've also embedded the gist at the end of this post). It's similar to Yihui Xie's SyntaxHighlighter brush for R.

Without further ado, here's a demonstration:

n <- 10
for (i in seq_len(n)) {
    # say hello, many times
    message("hello world")
}

You can grab the files from my gist; you can also view an example page demonstrating 3 different themes I modified earlier.

To use it:

Host the the PrismJS and R PrismJS script somewhere and include them (note: you may wish to minify the R script and add it to the end of your PrismJS script so you only need to include one file):
```
<script src='prism.js'></script> 
<script src='prism.r.js'></script>
```
Include the stylesheet of your choice. You may wish to also define style classes for .token.function, .token.variable and .token.namespace, to determine how functions, variables, and namespaces are styled (the stylesheets on my gist have this already):
```
<link rel='stylesheet' href='prism.css' />
```
Any R code you want to be syntax highlighted should have class="language-r" on it (note: on the code tag, not the pre tag). Works for inline code too.

Et Voila!

Update: There is a known issue whereby hash tags within strings will be highlighted as comments. There are a number of workarounds mentioned in the link, though each has a use that will break it (regex-based highlighters always have these problems).

You can view my gist via bl.ocks.org to see a larger code snippet, and play around with a few themes I modified to make the R look pretty: Okaidia (the one from PrismJS), Zenburn (based off the vim theme by Jami Nurminen), and Tomorrow Night 80s (like the one from RStudio, by Chris Kempson). I'm using Tomorrow Night 80s on my blog.

Wednesday, 29 May 2013

Searching for a Syntax Highlighter for R

I've recently been looking at various syntax highlighters I could add to my blog to prettify it a bit more. The only criteria was that it had to be easy for me to incorporate into this blog (me being an interwebs ignoramus), and it should support syntax highlighting for R and Javascript, being two languages I post a lot of snippets of.

At a (very) cursory glance, I found a few options:

R-Highlight

To use the highlight package for R:

include jQuery, the highlight script and CSS stylesheet;
initiate highlighting by using jQuery to select the nodes we wish to highlight and calling r_syntax_highlight();
has a list of recognised functions that will be highlighted (no regex there);
can even cause each function to link to its documentation (!)

However, the script seems to support the R language only (the highlight package when used from R appears to support many languages, but the script for webpages seems to only expose R syntax highlighting. I could be wrong here).

Syntax Highlighter

To use SyntaxHighlighter:

include the script shCore.js and stylesheets shCore.css, shThemeDefault.css, plus the script for each language you wish to syntax highlight;
initiate highlighting by labelling <pre> tags to be highlighted with class: "brush: <language>", and add a call to SyntaxHighlighter.all();
does not support R out of the box, but Yihui Xie has written a language definition for it (regex-based).

PrismJS

To use PrismJS:

include the script prism.js and stylesheet prism.css;
mark code to be highlighted using <code class="language-<language>">;
no need to call any functions to initiate highlighting;
does not support R out of the box, but languages can be defined using regexes.

Conclusion

They all seem pretty awesome, and I absolutely love R-highlight's ability to link to a function's documentation as well as marking it up. All three are fairly easily themeable. I strongly recommend you check them all out.

However, in the end I went with PrismJS, because

I couldn't work out how to use R-highlight for languages other than R (I often blog with Javascript snippets).
SyntaxHighlighter, while very popular, required me to host and link to many Javascript files (one per language). I didn't feel like doing this.
I love how PrismJS requires the class="language-X" part to be in the code tag, not the pre tag. A language is an attribute of code, not of a preformatted block, and as such you should mark a code block's language in the code tag, not the pre tag. Plus, this way of hinting the code language is recommended in the HTML5 specification.
PrismJS requires me to include just two files; the script and stylesheet. In addition, it's tiny! With Javascript, R, HTML/XML, CSS and Bash highlighting support the file is all of 7.8kB. <3

I wrote a syntax highlighter for R and PrismJS; I'll post it tomorrow (or whenever I get round to it). Here's a sneak-preview:

# iterate a dis/like of green eggs and ham
helloSam <- function (times=10, like=F) {
    str <- paste('I', ifelse(like, 'like', 'do not like'), 'green eggs and ham!')
    for (i in seq_len(times)) {
        message(str)
    }
}

Sunday, 26 May 2013

R gotcha - regular expressions

Just a quick post --- I came across this today and thought it was worth mentioning.

By default, the regular expression functions grep, gsub, regexpr, etc use extended regular expressions. By passing in perl=TRUE as an argument, one can use Perl regular expressions.

Note that in extended regular expressions, the . character matches the newline character '\n'. In Perl regular expressions, it doesn't.

grep('.', '\n')
## [1] 1
grep('.', '\n', perl=T)
## integer(0)

Something to keep in mind if you use regular expressions in R with strings with embedded newlines and were having puzzling results.

Wednesday, 17 April 2013

Installing RStudio Server on Fedora 18

I'm trying to install RStudio Server on Fedora 18 (64bit) but keep getting an error about libcrypto.so.6 and libssl.so.6. (I didn't enable EPEL as I am not on RedHat/CentOS).

Following the installation instructions, I:

wget http://download2.rstudio.org/rstudio-server-0.97.336-x86_64.rpm
sudo yum install --nogpgcheck rstudio-server-0.97.336-x86_64.rpm

But I get the error:

Error: Package: rstudio-server-0.97.336-1.x86_64 (/rstudio-server-0.97.336-x86_64)
           Requires: libcrypto.so.6()(64bit)
Error: Package: rstudio-server-0.97.336-1.x86_64 (/rstudio-server-0.97.336-x86_64)
           Requires: libssl.so.6()(64bit)

If I look in /usr/lib64 I have (both from the openssl-libs package: yum provides /usr/lib64/libssl.so.10):

/usr/lib64/libcrypto.so.10
/usr/lib64/libssl.so.10

So it appears my package versions are too new for RStudio Server (!).

I decided to try the hacky solution of making some links from the old version to the new (hoping that they are backwards-compatible):

sudo ln -s /usr/lib64/libcrypto.so.10 /usr/lib64/libcrypto.so.6
sudo ln -s /usr/lib64/libssl.so.10 /usr/lib64/libssl.so.6

Now I attempt to install again, and use rpm --nodeps instead of yum to force installation:

sudo rpm -ivh --nodeps rstudio-server-0.97.336-x86_64.rpm

It works!

Preparing...                          ################################# [100%]
Updating / installing...
   1:rstudio-server-0.97.336-1        ################################# [100%]
rsession: no process found
Stopping rstudio-server (via systemctl):                   [  OK  ]
Starting rstudio-server (via systemctl):                   [  OK  ]

(Note - if it's still complaining to you about not finding libraries, try

sudo ldd /usr/lib/rstudio-server/bin/rserver | grep 'not found'

You might get something like:

libssl.so.6 => not found
libcrypto.so.6 => not found

This will tell you which libraries you need to make links for.)

I can test whether it worked OK by opening a browser and pointing it to http://127.0.0.1:8787:

Huzzah! RStudio Server!

To enable external access I had to open port 8787 for my firewall (you could use the firewall applet for this instead of command-line)

iptables -A INPUT -p tcp --dport 8787 -j ACCEPT

Now I can continue with the rest of the instructions! Yay!

Tuesday, 16 April 2013

Getting RStudio to include your `R_LIBS_USER` in its library paths.

I recently installed RStudio Desktop version on my Linux computer, as I wanted to test it out. Previously I'd been happily using R from the command-line.

However, it kept complaining that I didn't have the knitr package installed.

Starting R from the command-line, I found that I did have knitr installed - library(knitr) worked fine! But library(knitr) from RStudio gave the errror:

Error in library(knitr) : there is no package called ‘knitr’

Upon further inspection, I realised that my package library paths were between R-command-line and RStudio, despite the R executable being identical between the two.

My usual R library for command-line R is ~/R/library, and this was in the .libPaths() as expected (executed from R from the command line).

However, when I executed .libPaths() from RStudio, I got:

[1] "/usr/local/lib/R/site-library" "/usr/lib/R/site-library"
[3] "/usr/lib/R/library"            "/usr/lib/rstudio/R/library"

So why did the same R binary produce different library paths?

It turns out that I set my default path ~/R/library by setting my R_LIBS_USER environment variable in my .bashrc file:

export R_LIBS_USER=$HOME/R/library

but RStudio was not reading my .bashrc file when it started up (makes sense I guess, as it's not running from the terminal).

The solution was to create a file ~/.Renviron and set the R_LIBS_USER variable there. R looks for this file upon starting up to set environment variables (see also ?Startup):

R_LIBS_USER=~/R/library

(Note - I could also just do .libPaths(c('~/R/library', .libPaths()) in an .Rprofile file, but I don't use those in general).

Now I start up RStudio and hey presto! It all works.

Thanks to the folk at StackOverflow, in particular flodel and Dirk EddelBuettel, who helped me work this out (the question has probably been deleted since then as it was too localized and probably should have been asked at the RStudio support page).

Thursday, 29 March 2012

Getting straight single quotes for code/verbatim in Sweave/knitr

Update 16 April 2012: Yihiu has fixed this from knitr 0.5! Thanks!

I've recently started using knitr to write reports, mainly about code I've written in R.

One thing that I insist upon in documentation on code is that code snippets within the document be able to be copy-and-pasted easily so the user can follow along.

This is why I don't like having the following in my documents:

> words <- c('Hello','world!')
> paste(words)
[1] "Hello"  "world!"

If the user wants to perform the code I've just written, they can't simply select everything and copy-paste; the > symbols are going to get in the way.

I much prefer something like this:

words <- c('Hello','world!')
paste(words)
## "Hello"  "world!"

This is why I like knitr as opposed to Sweave on which it is based; knitr seems to be more flexible in suppressing the leading > in input commands, and putting comments (##) in front of the outputs.

However, knitr has an annoying drawback that Sweave doesn't when it comes to typesetting code: a single quote mark/apostrophe ' in Sweave will stay as such in the output; in knitr, it will be converted to a left or right single quote.

By default, LaTeX will change "straight" single quotes ' into left ‘ and right ’ single quotes that are curled depending on whether the quote is open or closed.

If you try and copy-paste these into a terminal you will run into trouble, and R will complain about "unexpected input in "??"", where the "??" may be a funny looking symbol (depending on your terminal) that basically means "I don't understand this fancy symbol you gave me!"

Now, Sweave has some way of dealing with this. It converts all of these funky quotes into normal straight quotes that can be safely copy-pasted into R. Knitr doesn't.

weird curly quotes in knitr

How to fix this? Well, there is a LaTeX package upquote that converts all single quotes that occur in a verbatim environment (or \verb commands) from left/right single quotes into straight single quotes.

It uses the textcomp package to access the command \textquotesingle which is the straight single quote (it also does backticks via \textasciigrave). The upquote package basically says "if you encounter a quote in a verb-like environment, make sure it's \textquotesingle!".

So how does this tie into getting straight quotes in Sweave/knitr? Easy: add \usepackage{upquote} and a fairly arcane command to your preamble:

\documentclass{article}
\usepackage{upquote} % to convert funny quotes to straight quotes
\setbox\hlnormalsizeboxsinglequote=\hbox{\normalsize\verb.'.}%
\begin{document}
<<eval=TRUE,echo=TRUE,tidy=FALSE>>=
    words <- c('Hello','world!')
@
\end{document}

Now you can just run knitr on this and then pdflatex, and voila! Straight quotes.

straight quotes in knitr!

How does this work?

For the more TeX-inclined among you, this is why it works.

First of all, when knitr process a Rnw file (and makes a tex file as an output), it defines a whole bunch of individual characters and uses them in the output. Have a look at the preamble of a knitted document and you will see a whole bunch of:


\newsavebox{\hlnormalsizeboxclosebrace}%
\newsavebox{\hlnormalsizeboxopenbrace}%
....
\setbox\hlnormalsizeboxopenbrace=\hbox{\begin{normalsize}\verb.{.\end{normalsize}}%
\setbox\hlnormalsizeboxclosebrace=\hbox{\begin{normalsize}\verb.}.\end{normalsize}}%

There are lots and lots of these definitions. There appears to be one for each punctuation character and text size.

In particular there is one for the single quote mark, called \hlnormalsizeboxsinglequote. Every single time you have a single quote in a code chunk, knitr replaces this single quote with \usebox{\hlnormalsizeboxsinglequote}. Every single time you use any punctuation character at all within a code chunk, knitr will replace it with the relevant \hlnormalsize[charactername]. It's bizarre, and leads to very ugly code!

For example, the simple chunk in the example above gets rendered (in the tex document) like so:


\begin{knitrout}
\definecolor{shadecolor}{rgb}{0.969, 0.969, 0.969}\color{fgcolor}\begin{kframe}
\begin{flushleft}
\ttfamily\noindent
{\ }{\ }{\ }{\ }\hlsymbol{words}{\ }\hlassignement{\usebox{\hlnormalsizeboxlessthan}-}{\ }\hlfunctioncall{c}\hlkeyword{(}\hlstring{\usebox{\hlnormalsizeboxsinglequote}Hello\usebox{\hlnormalsizeboxsinglequote}}\hlkeyword{,}\hlstring{\usebox{\hlnormalsizeboxsinglequote}world!\usebox{\hlnormalsizeboxsinglequote}}\hlkeyword{)}\mbox{}
\normalfont
\end{flushleft}
\end{kframe}
\end{knitrout}

How gross!

Anyhow, remember that knitr inserts all the savebox commands before the preamble you put in your Rnw document. Well, \setbox operates such that it calculates the contents of the box straight away and saves it to the box register, and then forgets the definition of the box (i.e. the \normalsize\verb.'.).

What this means is that \hlnormalsizeboxsinglequote gets defined before the upquote package is even loaded, and hence the effect of upquote (redefining ' to \textquotesingle within verbatim commands) happens too late to affect the \verb.'. that occurs in \hlnormalsizeboxsinglequote.

To fix this, we would like to retrieve the definition of the \hlnormalsizeboxsinglequote command after we load the upquote package so that its definition gets re-parsed. Then we'd just have to type something like \edef\hlnormalsizeboxsinglequote\hlnormalsizeboxsinglequote to say "set \hlnormalsizeboxsinglequote to what it used to be, but re-read the definition first".

Unfortunately there appears to be no way to do this. Hence the only fix is to look up how knitr defines \hlnormalsizeboxsinglequote by grabbing it out of the preamble of a knitted document, and copy its definition into the preamble of the source document.

This works for now, but it just means that if the knitr package changes how it defines \hlnormalsizeboxsinglequote (maybe in one revision they decide to make all quotes blue in colour), it is up to you to make sure that your redefinition of \hlnormalsizeboxsinglequote in your Rnw file matches that used by Sweave.

Wednesday, 7 March 2012

Be a NethackR!

Net Hack is one of the most amazing games of all times.

It's a rogue-like game, where the quest is to retrieve the Amulet of Yendor from the bottom of the dungeon and bring back it up to the top in order to sacrifice it to your deity and achieve immortal fame & glory, etc. Along the way, one must avoid the many (and I mean many) ways to die, including from the evil Wizard of Yendor, also known as Rodney.

Adventuring through the dungeon (aww, I died)

There are many, many, many ways to die in Net Hack. Also, there's no saving except to resume your game later - once you die, you die. You have to restart the game from scratch. Finally, the game comes with a small hints book to get you started, but no real instructions (like "don't look at Medusa or you'll die! Don't touch a cockatrice or you'll turn to stone! Don't eat to much or you'll die of overeating (not kidding!)").

These factors all make Net Hack a very, very, hard game. And yet addictive! I have yet to win the game after a couple of years of on and off playing, but I still love it.

Anyhow, I decided to write an R package that would let me play Net Hack in R (terminal version, of course! I wouldn't play the graphics version unless I didn't have a keyboard!).

Why would I want to play Net Hack from R? Well ... why not? :D

You can download it from here - either go to the 'Downloads' page and grab the .zip file and install within R (Packages -> Install package(s) from local zip files... OR install.packages('nethackR_1.0.1.zip',repos=NULL)), or if you feel hackerish and are running Cygwin or Linux (or Mac? haven't tested it there), you can grab the source, unzip, and type:

make
make install

After that, go into R, read the help file, and start a game!

library(nethackR)
?nethackR  # read some help files
?nethack   # read some help files
nethack()  # start a game!

You can even feed in nethack options:

nethack(dogname='Indy',catname='lolcatz',hilite_pet=TRUE,time=TRUE)

Enjoy! (and let me know of bugs, I'm sure there are some).

As a note - the package comes bundled with the Net Hack executable already. You may not feel secure running an exe that the package author (me) guarantees you is the actual NetHack.exe and not one filled with viruses. If so, download NetHack yourself and place it within the bin/your_OS-type folder in the nethackR folder of your R library. your_OS-type is either 'unix' (for Linux and Mac) or 'windows' (for Windows). That way you can be sure the executable is safe.

Onward NethackRs!

RIP yet another character!

Extra rambling (mainly for R people):

This was mostly an exercise in writing R packages - it was the first one I ever wrote and wanted something fun to motivate me.

It turned out to be much easier than I thought - you can just call system('nethack'), and R takes care of the rest, even the interactive part - it's as if I'd just run nethack from the terminal instead.

However, I then took this to Windows to test, and if I used the GUI console for R (Rgui.exe as opposed to Rterm.exe), NetHack would start but hang my system until I forcibly closed the NetHack.exe process using the System Manager.

I figured out the solution today by looking at the help file for system in R in Windows (turns out the help file is different in Linux and didn't include this all-important information) - turns out I can't run interactive (text) programs in Rgui, it just doesn't work.

So instead, if the user uses Rgui, the package will launch a command prompt from which the user can play.

Thursday, 2 February 2012

A text-based file chooser in R

I've had to write some code allowing users to select various files for processing in R. Since they operate my script through a text console, I thought it'd be easier to provide some way for users to select the files they want processed interactively. Basically, a file dialogue in R. Furthermore, I wanted to avoid loading in any extra GUI packages (RGtk2,tcltk,...) to keep the script light-weight. Bingo! In R there's a command file.choose that allows a user to pick a file. In Windows or Mac, it comes up with a file dialog. However in Linux, it has this unhelpful interface:

> file.choose()
Enter file name:

That's it. It doesn't offer the ability to navigate through folders, you just have to know your entire file path and enter it in. I wanted something a bit more guided. So, after asking this question at Stack Overflow, and with the help of the friendly people there, I wrote my own function to browse files:

#' Text-based interactive file selection.
#'@param root the root directory to explore
#'             (default current working directory)
#'@param multiple boolean specifying whether to allow 
#'                 multiple files to be selected
#'@return character vector of selected files.
#'@examples 
#'fileList <- my.file.browse()
my.file.browse <- function (root=getwd(), multiple=F) {
    # .. and list.files(root)
    x <- c( dirname(normalizePath(root)), list.files(root,full.names=T) )
    isdir <- file.info(x)$isdir
    obj <- sort(isdir,index.return=T,decreasing=T)
    isdir <- obj$x
    x <- x[obj$ix]
    lbls <- sprintf('%s%s',basename(x),ifelse(isdir,'/',''))
    lbls[1] <- sprintf('../ (%s)', basename(x[1]))

    files <- c()
    sel = -1
    while ( TRUE ) {
        sel <- menu(lbls,title=sprintf('Select file(s) (0 to quit) in folder %s:',root))
        if (sel == 0 )
            break
        if (isdir[sel]) {
            # directory, browse further
            files <- c(files, my.file.browse( x[sel], multiple ))
            break
        } else {
            # file, add to list
            files <- c(files,x[sel])
            if ( !multiple )
                break
            # remove selected file from choices
            lbls <- lbls[-sel]
            x <- x[-sel]
            isdir <- isdir[-sel]
        }
    }
    return(files)
}

It basically looks at the directory you specify and allows you to choose folders to explore or select. You can select multiple files too. The magic boils down to the menu, which, given a vector x, will provide a text interface asking the user to choose on element of x. Example use:

> fl <- my.file.browse('path/to/dir',multiple=T)
Select file(s) (0 to quit) in folder path/to/dir:

1: ../ (to)
2: subdir/
3: b.jpg
4: a.txt

Selection: 3
Select file(s) (0 to quit) in folder path/to/dir:

1: ../ (to)
2: subdir/
4: a.txt

Selection: 2
Select file(s) (0 to quit) in folder path/to/dir/subdir:

1: ../ (dir)
2: c.jpg

Selection: 2
Select file(s) (0 to quit) in folder path/to/dir/subdir:

1: ../ (dir)

Selection: 0
>
> fl
[1] "path/to/dir/b.jpg"
[2] "path/to/dir/c.jpg"

Nifty! (of course, it could do with improvements - allow for selecting directories say, or filter files shown).