Playing with Redis 

Over the past few months, as I've been playing with different web technologies, I've dabbled with Redis. I really like the idea behind it: a super fast, all-in-memory, key-value store (disk basked so that you don't lose everything in a crash, can replicate, get backups, etc).

I think what I like most about it is that it is does one thing very very well. It doesn't try to replace your relational database. It just focuses on being a super fast way to store little buckets of data. Now, granted, you can do really creative things with those little buckets of data, but, by and large, Redis is just unbelievably fast.

Today I had a chance to play with Redis using some pseudo-real data. I wanted to benchmark/simulate how it would perform at holding keywords, and then the references (ids) to logs that contain those keywords. I'm imagining using this to keep track of support logs (hey, ftp has been mentioned in support contacts 15x today) or maybe to track server performance (tracking loads of servers or which services are running, etc). Tons of systems can do this. What Redis does really well is doing it fast (have I mentioned that?) and doing really awesome set geometry (intersections, unions).

For instance, imagine you wanted to keep track of support contacts. You look for keywords coming in via email messages (ftp, mail, outage, whatever). That's easy. But now what if you wanted to get the overlap where ftp was mentioned in the same email as cancel? That's harder. In MySQL, you might have to do some smart sub-selects, or joins to a bunch of mapping tables. It works fine when you've got a small number of rows, but the performance gets progressively worse. You probably also end up writing some ridiculous queries to make it happen in a way that doesn't end up going off to disk and killing your performance entirely.

With Redis, it's dead simple. I mean seriously simple.

First, you install Redis. It's about the easiest thing I've done. Untar, make, run. I didn't even bother to make any config changes.

Next, either via script or via the command line, you start adding data. For the support keyword example, you might do something like:

sadd ftp id_1
sadd ftp id_2
sadd ftp id_3
sadd ftp id_4
sadd cancel id_2
sadd cancel id_4

Basically, messages 1-4 mentioned FTP. Messages 2 and 4 also mentioned the word cancel.

To get that intersection, you follow that up with:

sinter ftp cancel

And that's it, it'll tell you id_2 and id_4 contain both words. Done and done.

Of course, that's a pretty simple example. I threw together a test script that generated 80 random keywords and 50000 random message ids for each keyword. The random ids were all in the same range (1 - 999999), so there's a good chance of some overlap, but not a ton. This seemed like a pretty good test of Redis.

It took, on average, about 4.5 seconds to insert 50k bits of data. So for the total of 4 million entries, it took about 6 minutes. That's pretty darn fast.

But what's so much more impressive is the intersection work. To get the intersection between two sets of data (which was usually between 1500-2500 entries) it took …

.05 seconds

Seriously.

And that, right there, is why Redis is f'ing awesome.