Python and R integration in r17 1.7.1

Matt Nourse, October 15 2012

This is a seriously delayed blog post covering r17's 1.7.1 release back in August. I've been helping out with a few different web projects for the last few months including a gnarly J2EE beast that serves a busy site across a couple of data centres. I'm starting to get some breathing room now.

The big news for 1.7.x is that now you can write Python and R code inside your r17 script, making it even easier than before to use r17 as a scalability/performance helper for those languages. And of course now it's also easier to use Python's and R's vast libraries from within r17.

The new lang.python(@@@ python @@@) stream operator executes Python code using the python interpreter in the shell's path. The Python script's standard input is the input stream. The Python script's standard output is the output stream. R17 prepends helper Python code to the Python script before passing it to the system's Python interpreter, so all you really need to worry about is the Python logic, not mucking around with serialization and deserialization issues.

The simplest possible example is copying the input stream to the output stream:

...
| lang.python(@@@
for inputR in r17InputStream:
    r17OutputStream.write(inputR)
@@@) | ...

r17 doesn't add or remove whitespace to the Python script because indentation is so important in Python. So you need to indent inline Python code as if the Python code was in its own file.

This example assumes that the input stream contains a "value" column that is some kind of number:

...
lang.python(@@@
for inputR in r17InputStream:
r17OutputStream.write({value: inputR.value + 1})
@@@);
| ...

Similarly, lang.R(@@@ R code @@@) executes R code using the Rscript interpreter in the shell's path. R17's helper code supplies an r17InputTable table and an r17WriteTable function. r17InputTable contains the entire stream contents. The table's column names match the r17 stream headings. The definition of r17WriteTable is:

r17WriteTable <- function(colNames, t) {
    write.table(sep="\t", quote=FALSE, row.names=FALSE, col.names=colNames, t)
}

R17 can't infer the output table column types so you need to include the r17 types in the column names, for example:
r17WriteTable(c("string:name", "int:value"), table)

That's it! Mashing up Python, R and r17 is now super-easy.

Happy mashing!

blog comments powered by Disqus