Examples

  • External applications can participate at the start, the middle or the end of queries.

    meta.shell("java NetworkSniffer.class")
    | rel.from_usv()
    | rel.join.natural("interesting_ips.r17_native")
    | rel.select(ip_address)
    | rel.group()
    | rel.to_usv()
    | meta.shell("python visualise.py");

    "USV" is very easy & fast to parse and generate. It uses ASCII 31 ("Unit Separator") to delimit fields and ASCII 0 to delimit records. r17 also supports TAB-separated-value and regex-parseable data but both of these tend to be slower & harder for external applications to parse.

  • meta.shell("for FILE in `ls *.xls`; do py_xls2csv $FILE | grep '^\"'; done")
    | rel.from_text('"[^"]*?", "[^"]*?", "([^"]*?)", "[^"]*?", "[^"]*?", "([^"]*?)"', "string:email", "string:referrer_url")
    | rel.select("UPDATE customer SET referrer_url = '" + referrer_url + "' WHERE email = '" + email + "';" as sql)
    | rel.to_tsv();

    This example shows XLS to SQL translation using a Python XLS to CSV converter. Quote escaping omitted for brevity.

  • # Distribute analytics processing using the R statistics language.
    # This example only distributes processing amongst the CPUs on the local machine
    # but the same principle applies to distributing processing across multiple
    # machines. The R script would just need to be distributed in advance.

    # Parse & split the input. In the "real world" this would probably be done separately.
    io.file.read("distributed_r_example_input.tsv")
    | rel.from_tsv()
    | rel.record_split(10, "tmp.distributed_r_example.");

    # Distribute the R operation amongst the CPUs on the local machine.
    io.directory.list(".")
    | rel.where(str.starts_with(file_name, "tmp.distributed_r_example."))
    | rel.select(file_name, "localhost" as host_name)
    | meta.parallel_explicit_mapping(
        rel.to_tsv()
        | meta.shell("Rscript --slave increment.R")
        | rel.from_tsv())
    | rel.order_by(value)
    | rel.to_tsv();

    For this example, increment.R is:

    table <- read.table(header=TRUE, file="stdin", sep="\t", quote="")
    valueP1 <- table$int.value+1
    write.table(sep="\t", quote=FALSE, row.names=FALSE, col.names=c("int:value"), valueP1)

    ...and distributed_r_example_input.tsv is just a list of numbers in TAB-separated-value format:

    int:value
    19
    17
    60
    40
    26
    .
    .
    .