Justin Santa Barbara’s blog

Databases, Clouds and More

Day 6: A Redis View of the World

I’m building a set of open-source cloud data-stores in December, blogging all the way.

In our last episode, we added garbage collection to the BTree, so it’s now at the point where we can start using it to implement useful functionality.

Today, I added a basic Redis front-end to the RESTful API we already had. I had a huge head-start because I was able to work from the redis-protocol project, which actually implements a Redis server in Java. That project (like Redis itself) is designed around a single-server, in-memory implementation. The redis-protocol project is well designed, and it would have been fairly easy to plug-in our distributed backend implementation. Nonetheless, I rewrote the code to make sure I had a good understanding, but I followed almost exactly the same design. So implementing basic support for the redis protocol was fairly quick, but I owe all that to the redis-protocol project.

Like redis-protocol, I used the excellent Netty library, which makes dealing with asynchronous I/O in Java easy. Unfortunately, Netty recently went through a major version change (from 3 to 4), and the documentation and informal documentation (blog posts, sample code, StackOverflow questions) hasn’t quite caught up. It is noticeably better than it was 6 months ago and continuously improving, so it’s definitely worth going with the latest version (for new projects at least.)

The redis protocol itself is a text-encoded protocol pretending to be a binary protocol; e.g. each variable-length component is preceded by the length encoded as text. There’s a lot more work than a true binary protocol would be to parse (both in terms of code and in terms of CPU overhead). Seems like an odd design decision…

On that note, there was a great talk on HTTP 2.0 given at Heavybit by Ilya Grigorik (video coming soon, I hope!) In it, he pointed out that HTTP has great advantages: great client support, firewall and proxy friendly, great developer tools, and in general just having an incredibly complete infrastructure around it. Where it falls down is that performance can be inferior to a raw TCP connection (particularly if you want concurrent requests), so high-speed protocols typically end up rolling their own binary-over-TCP protocol. These don’t get any of the benefits that you get with HTTP, but are fast. However, with HTTP 2.0 (the protocol formerly known as SPDY), HTTP is now a multiplexed binary protocol, and should be comparable in performance to a hand-rolled binary protocol. In other words, there should hopefully be no new binary protocols. Even today, before HTTP 2, you should probably based your protocol on HTTP if you can get away with it, and know that a big performance boost is coming with HTTP 2. You can still beat HTTP 2 with a custom binary protocol in theory, but HTTP 2 will beat the protocol you build in practice.

The next step for our Redis experiment was to implement the basic redis commands, notably get and set. With that, it’s now possible to access our key-value store using the redis protocols, so all the redis libraries and tooling should work (for the small subset I’ve implemented).

It is interesting to compare this to official Redis. The big advantage is that we are distributed; Redis is persistent, which is great, but then the big question is “what happens when a machine fails”. Memcache has a much more coherent answer here, because it is a cache: values can go away for any reason. Redis doesn’t have quite the same self-consistent answer.

As well as HA, we also have a multi-threaded implementation: we support concurrent reads, though our writes are serialized by Raft. This came about naturally; I haven’t been carefully coding all the time to support concurrent operation. We’ve inherited the concurrent reader design from LMDB, and some thought was required when it came to garbage collection, but everything else is just sensible New-Java: minimizing mutability, basic locking around shared data structures etc. Of course, I know there are plenty of threading bugs still left, but the design is naturally multi-threaded in a way that it wouldn’t be in other languages (like C).

Because we operate on a cluster of machines, writes will be much slower than official Redis. I suspect we’ll be at least as fast as reliable replicated writes will ever be in Redis. Benchmarking reads (after some performance profiling) against Redis would be very interesting: we should be slower on a per-request basis, because we’re in Java and because of our copy-on-write database, but making up for that is the fact that we can run requests concurrently, even making use of multiple cores. I suspect for mostly-read benchmarks we may be able to get much higher throughput.

But what’s more interesting to me than a pure speed contest is the idea that Redis is just a protocol on a universal key-value store, and one that we’ve built to work well on “the cloud”. We can implement the memcache protocol, the MongoDB protocol, a SQL database protocol; all backed by the same data store. Things are about to get interesting…