r_20080131.jpg

I was digging around for an open source statistics package today and came across R, a GPLed statistics and and data analysis suite. Sweet!

R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.

One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

So I’ve been messing around with this for the last half hour and it’s really an exciting package, especially if you’re a coder or unix geek. You interface with R through a command line programming interface, executing simple statements, setting variables, and defining functions. It feels similar to issuing commands at a unix prompt, except you’re working with data sets instead of file descriptors.

What’s cool is the robust capability of the standard function set. Want to read in a data set from a tab delimited table you found on the internet? Check this out:

# Read a table in from a URL (tab delimited table with row headers)
Mydata <- read.table(http://someserver.com/table.txt', header=TRUE)

# Display summary (mean, median, min, max, etc.) for each column
summary(Mydata)

# Get the standard deviation for the values in column "foo"
attach(Mydata)
sd(foo)

Learning the command set is a little daunting at first, but the console even does tab completion. If you don’t know what a function does, just put a question mark before it. For instance, “?sd” will quickly pull up help for the standard deviation function.

I’ve only scratched the surface, but there are links below to some R beginner guides which should help you get started. Anyone out there more familiar with the package? Please share any useful links and tips in the comments.

The R Project for Statistical Computing – Link
An Introduction to Statistical Computing in R – Link
Producing Simple Graphs with R – Link