I’m building a set of open-source cloud data-stores in December, blogging all the way.
In our last episode, we added lots of Redis commands to our key-value store, cleaned up the architecture a little bit (introducing the command design pattern to cope with the ever growing number of commands), and struggled with Redis’ lack of compare-and-swap.
Well, I found it: Redis does support compare-and-swap, although it’s a little bizarre and seemingly deprecated in favor of the even-more-bizarre Lua scripting. Redis implements something more akin to load-link/store-conditional than the simpler compare-and-swap. It requires issuing 5 commands (WATCH, GET, MULTI, SET, EXEC) and needs an extra network round-trip, but it’s there. So, we can use that to build cool things that need compare-and-swap, which is what I’m actually trying to do!
So we have a sub-optimal solution, but we don’t really want to fix it right now. What I try to do in this situation is to hide the bad solution behind an interface, so it’s easy to replace it in future. So we have RedisKeyValueStore implementing KeyValueStore. We can use the Redis implementation for now, and replace it with something better designed later. We can make progress, and it may turn out that it’s never worth replacing the “terrible” approach (YAGNI). Because of that I haven’t yet implemented the extra Redis functionality (WATCH / MULTI / EXEC), and I’m just running against traditional non-cloud Redis. Let’s actually get something done!
So today, I built Git storage on the cloud, backed by OpenStack cloud storage. Cloud storage is typically eventually consistent, which is a much weaker guarantee than a traditional filesystem gives you. However, it turns out that git is actually very lenient in what it requires of its storage (the git design is excellent; it is essentially a well-implemented Merkle tree; the genius was realizing that this was sufficient). Git stores blob data (containing the actual file data), and it stores references (which are just hashes of the latest trees). The blob data is as big as the code commit (KBs or MBs), a reference is less than a hundred bytes (a name and a hash). Further, the blob data is immutable, and thus needs almost no consistency guarantees from its storage; so wecan easily store it on cloud storage. The reference data, though, must be stored and updated consistently, so it can’t easily be put onto object storage. It’s thus been non-trivial to host Git on the cloud. But we’ve built a consistent key-value store, which is perfectly suited to solve exactly this problem. Best of all, Google have implemented their Git storage exactly the same way, and open-sourced their code as part of JGit and Gerrit, so I didn’t even have to implement all the details of git.
Just as I did with Redis, JGit has interfaces that mask the ‘terrible’ blob-store and reference-store implementations that use the filesystem. I believe Google maps both of these to BigTable. But we can map the blob-store to OpenStack Storage and the reference-store to the Redis protocol, now that we know we can implement the Redis protocol in a cloud-suitable way. JGit does some great caching, so this works wonderfully, even when I ran against Rackspace’s Cloud Files product (which runs Keystone and Swift), storing data half-way across the US.
This is truly cloud Git: all the data is now stored redundantly on multiple machines / locations, and it uses cloud services via APIs. Swift is obviously great for cloud operations; our key-value store isn’t quite so far along but it can get there architecturally. I think this also demonstrates what I mean by a cloud-first data-store: we’re using Keystone for authentication, we’re using Swift for data-storage. Our key-value store (or something like it) will be a cloud service as well. It doesn’t have any authentication yet, but we’ll do the same thing as Swift does and integrate with Keystone, instead of building a second store of users.
Compare this to how Github has done this: they use a traditional filesystem to store their git data, so to ensure that is available they have to use a complicated DRBD architecture. Although I like DRBD, it is a little bit fragile and things go wrong. I think that the block-storage metaphor is not the right approach for the cloud: it fundamentally imposes a single-server mindset, and it’s difficult to get both high-performance and high-availability. (Amazon’s Elastic Block Storage product is probably the most problematic piece of AWS, I think mostly because they favored high performance.)
The real issue is that Github have ended up with a complex and not-very cloudy architecture; for example presumably they shard their repositories across DRBD volumes, and they presumably had to figure out how to live-migrate ‘hot’ repos, as well as implementing all the disaster recovery themselves. Github is solving a lot of ‘infrastructure’ problems. I think those problems should be solved by the cloud, so that GitHub can have a very much simpler, almost stateless, architecture of web-servers consuming well-tested cloud services. Github are running on the cloud, but they’re not really using cloud architectures. (That’s not really their fault though - I don’t think this approach for storing Git data is very well known!)
The other big piece is making sure this is all open-source so companies like GitHub can use it confidently. We already have great open-source object storage, and hopefully by the end of the month we’ll be well on the way to great open-source structured data storage :-)