Matt Nourse, August 15 2011
In the first enthralling installment I compared MySQL, PostgreSQL and r17 on a spinning disk. That comparison was straightforward but this SSD bakeoff comes with some hand-waving. I've halved the data set size because MySQL's indexing temporary files for the full data set would not fit on the SSD. R17's compressed data format is so much smaller than either MySQL or PostgreSQL that you're more likely to be able to take advantage of SSD performance with r17. This is a "good thing" but just how good it is varies between data set types and sizes (I'm waving my hands very expressively at this point).
Anyway, on with the summary graph:
Total load+index+query time
All times are in minutes- less is better.
This is the same as the scenario in part 1 except that there are only 100 million page views so the total data set size is just 27 GB. Also I've only used the "parallel concurrency" r17 mode. I've included the "parallel concurrency" figures for spinning disk just for ease of comparison.
For PostgreSQL and MySQL, this consists of loading the data into the database, then creating additional indexes. For r17 this step translates the TAB-separated-value file into r17 native format. R17 has no indexing step. The database schema and import scripts are the same as in part 1.All times are in minutes- less is better.
Disk space usage is important while SSD storage is significantly more expensive than mechanical disk storage. The SSD in this test costs about US$2.90 per usable GB. The mechanical disk in part 1 costs around US$0.05/GB.All sizes are in Gigabytes- less is better.
I ran a
VACUUM FULL ANALYZE command against the PostgreSQL database. It completed in less than 10 seconds.
This query orders all users by the number of times they appear in the logs. Full SQL and r17 scripts are the same as for part 1.All times are in minutes- less is better.
MySQL is the clear winner for this query, but note that MySQL also uses more space to store the data so would be the first to exceed available SSD storage space. Bakeoff part 1 shows that MySQL indexing performance on mechanical disk drops off a cliff when the data set exceeds the available RAM.
This query sorts all users by the number of times they appear in the log, gets the 10 most active users, joins the list of 10 most active users to the log, grabs the URLs from the joined records and sorts by the number of times those URLs appear. Full SQL and r17 scripts are the same as for part 1.All times are in minutes- less is better.
R17's parallel processing (particularly parallel join) is what gives it the winning edge here. My testing against other database systems suggests that it performs even better for queries with more joins. I'm keen to explore more complicated queries with more joins in future posts.
R17 provides competitive performance on SSD especially for data exploration or when the rate of data change is high. Even r17's mechanical disk performance compares well to MySQL and PostgreSQL on SSD for this scenario.
In the next post I'll bring Hadoop into the mix and in later posts I'll explore some more complicated scenarios.
r17 is currently a free download. I'd love to hear about your experiences with it via email (email@example.com) or via the comments below.
This is the same as for part 1 except that the spinning disk is swapped for an SSD.
Quad-core Intel i7 930 2.8GHz
12 GB DDR3 RAM
128GB OCZ Vertex 3 MAX IOPS SSD
Ubuntu 10.04 LTS
All other technical details are as for part 1.