I’m building a set of open-source cloud data-stores in December, blogging all the way.
In our last episode, we added multi-row transactions to our data-store, and examined the options for SQL parsing. PrestoDB seemed like the best for our requirements, though it didn’t offer fast enough execution of simple queries.
Today I got a basic SQL query to run. Just a single-table scan, and there are still some bits hard-coded, but it works. We can parse a query using the PrestoDB parser, check if it is sufficient for us to execute it directly, and (if so) run it. This is a great step forward!
PrestoDB is a fairly complicated system, but the SQL parser is actually fairly well self-contained. We have to set up a fair bit of infrastructure (like table metadata), but within a few hours I was able to get a unit test using the PrestoDB SQL parser to parse a simple SQL statement. PrestoDB also generates what it calls a “logical plan”, which is an execution plan before considering that some tables may be distributed. That’s exactly what we want, as for the moment our queries won’t be distributed.
So now the tricky bit: executing the SQL statement, using the PrestoDB plan. The idea here is that PrestoDB’s overhead won’t matter on a query that takes a second or more anyway, so we only need to concern ourselves with queries that should be fast. In particular, we really care about queries that only read from a single table and ideally use an index to narrow down the rows.
Even that is incredibly complicated; not least because PrestoDB doesn’t really support indexes, so it doesn’t give us a plan with index support (we’ll have to do that ourselves!) So, I worked towards executing a single-table query-scan (i.e. no indexes). I also put in the infrastructure for more advanced queries, even if we don’t use it yet.
Sure enough, around 5PM I finally got the first SQL execution. I take the logical plan (which is structured as a tree), and build another tree for the execution plan, using the Visitor design pattern. PrestoDB (like most SQL query engines and also code compilers) is essentially structured around a series of transformations, going from a tree that is a direct mapping of the SQL query, through lots of intermediate stages, to a final tree representing the execution flow. The final execution tree is then executed directly, often using a “pull model” where the next row is retrieved whenever the caller requests it; each node in the execution tree typically retrieves the next node by pulling rows from its children, etc. PrestoDB instead uses a pull-model, I think mostly for performance reasons: it allocates (fairly large) buffers into which is writes the rows and the passes those buffers to the dependent nodes. This results in fewer method calls and also has better memory performance. I suspect it also works much better with their dynamic code compilation approach.
I implemented a mixture of these approaches, where the rows are pushed to the reader, but function evaluation is done using the pull-model. I also implemented an experimental ValueHolder, which is a mutable buffer for a value; it’s supposed to reduce GC pressure a little (by allowing the value objects to be re-used, instead of building a new object every row). This is probably all over-engineered and I’ll rework it over time as I see what’s good and what’s not, but it seemed more important just to get something working.
The rows are then streamed into the HTTP output stream, and we can parse them in the client, just as we’ve done for key/value scans in the past.
Seeing the first SQL query succeed was a huge thing. This means we don’t have to invent our own query language, which would be painful to code, and even more painful for the callers who would have to learn another querying approach. There are always people that prefer the new thing because it’s new, but SQL seems like a much better option for everyone else!
The biggest remaining issue with SQL is that the performance isn’t great. I did some basic performance tuning, reducing logging output and re-using the SQL parser, which made things go twice as fast (in a quick micro-benchmark). Some simple profiling using VisualVM showed that PrestoDB’s SQL parsing and analysis is now the slowest phase. I could work on tuning it, but the bigger win would come from caching the parses and re-using them across identical queries. I may shelve that idea for a later time; we’re getting 300 queries per second (in a single sequential client thread) which is actually probably good enough for now. It’s also good to see that my “quick” query execution is actually quick enough that the SQL parsing is the bottleneck - it suggests that the approach works!