In our last post we looked at the performance of Spark vs hand-written C code for a single query, Q6, from the standard TPC-H benchmark. We saw that Flare is able to accelerate the Spark query by a factor of 20x, to exactly the same performance as the hand-written C program.
In this post, without further ado, we present results for the full TPC-H suite of 22 queries. We are again interested in single core performance, mainly to gauge the inherent overheads in Spark, which Flare reduces by a large margin.
Why single-core? Quoting Paul Barham via Frank McSherry: “You can have a second computer once you’ve shown you know how to use the first one”. Premature scale-out has immediate drawbacks in terms of higher datacenter and operating costs, and inefficient use of energy may have consequences as far-reaching as contributing to global warning.
All numbers below are in milliseconds (ms), measured on a single core, after pre-loading the data into memory. We show results first for the SF1 dataset (1 GB), then for SF10 (10 GB).
We can see that Flare exhibits large speedups, not only compared to plain Spark, but also to widely used relational database systems like PostgreSQL.
What does this mean for scalability? By using each individual core much more efficiently, Flare can process data at much lower cost. For some queries, we would have to run Spark on hundreds of cores, quite likely spread across a cluster of tens of machines – and with perfect scalability – to achieve the same performance as Flare on a single core.
Now what about parallel performance in Flare? That’ll be a topic for another post.