LightCloud adds support for Redis

Plurk's open-source cloud database LightCloud got a bit more powerful by supporting Redis.

Redis is yet another key-value database, but with some nice and curly twists:

  • it's persistent (but one has to hold the dataset in the memory)
  • it supports unique datatypes such as lists and sets
  • it can do some very interesting stuff like union and intersection between sets
  • it's very fast since everything is kept in memory

How does it compare to Tokyo Tyrant?

LightCloud was initially built around Tokyo Tyrant, so a comparison between these two is inevitable.

On my first benchmarks it seemed that Redis was 7 to 10 times faster than Tokyo Tyrant, but doing more tests I have found out that it's slightly faster. My benchmarks can be read in a section below. The bottom line thought is that Redis is faster than Tokyo Tyrant.

The thing that makes Redis interesting is the extra data types such as sets and lists. It should be stated thought that Tokyo Tyrant supports Lua scripting which enables one to create custom datatypes (for example a list extension in Lua). Lua scripting is really powerful, but Redis's list operations are also nice to have. It's clear thought, that Tokyo Tyrant's Lua scripting offers more freedom.

In the database layer it's important to note that Redis has to keep the data in memory - - while Tokyo Tyrant does not. This enables Redis to do some powerful features - such as intersection between sets. A major problem with Redis's approach is that one must have all the data in the memory (which means that Redis it not a good choice if you have lots of data).

On scalability side Redis is weaker than Tokyo Tyrant as Redis only supports master-slave replication, while Tokyo Tyrant supports master-master replication.

The last remark is that Redis is a new product and there are some rough edges. Tokyo Tyrant is an old and well tested product. Both products are under active development thought.

You can read more about Redis in the README file.

Benchmarks

The benchmark program outputs following stats:

Finished "Tyrant set" 10000 times in 5.71 sec [1750.8 operations pr.sec]
Finished "Redis set" 10000 times in 3.64 sec [2749.5 operations pr.sec]
------
Finished "Tyrant get" 10000 times in 2.06 sec [4842.8 operations pr.sec]
Finished "Redis get" 10000 times in 1.75 sec [5701.0 operations pr.sec]
------
Finished "Tyrant list_add" 10000 times in 6.50 sec [1538.8 operations pr.sec]
Finished "Redis list_add" 10000 times in 5.41 sec [1849.3 operations pr.sec]
------
Finished "Tyrant delete" 10000 times in 15.88 sec [629.7 operations pr.sec]
Finished "Redis delete" 10000 times in 8.86 sec [1128.5 operations pr.sec]

It's clear that Redis is faster - sometimes even 2x faster. One should note thought that Redis does not hit disk, so it's really expected to be faster :-)

How to use it with LightCloud?

import lightcloud

LIGHT_CLOUD = {
    'lookup1_A': [ '127.0.0.1:10000' ],
    'storage1_A': [ '127.0.0.1:12000']
}
lookup_nodes, storage_nodes = lightcloud.generate_nodes(LIGHT_CLOUD)
lightcloud.init(lookup_nodes, storage_nodes, node_type=lightcloud.RedisNode)

def test_set_get():
    lightcloud.set('hello', 'world')
    assert lightcloud.get('hello') == 'world'

Are memory databases the future?

High Scalability has written some interesting pieces on memory databases, they are worth a read:

Conclusion

Redis offers another take on a database and Salvatore Sanfilippo seems to be driven by passion - - which is important for any project.

It's clear that Redis is faster than Tokyo Tyrant, but currently I think that Tokyo Tyrant is a more mature product - - so unless you need sets, then Tokyo Tyrant seems to be a safer choice.

Personally, I really welcome the development of both products and a big kudos goes to Salvatore Sanfilippo and Mikio Hirabayashi for their amazing work.

Announcements · Code · Python 10. Jun 2009
23 comments so far

Hey, that's wicked cool. I've heard interesting things about Redis - but I don't think I realized it was so much faster. I'll look forward to seeing how it works.

Cheers!

Claudia

That's a great comparison, and done against a real-world usage. Very cool.

Btw about taking everything in memory or not, I suspect that if you run the test against a box with 300 million keys (you need 64GB of Ram to hold this number of keys in Redis) the performance difference among the two will be much greater. In general the disk access should start to be a bottleneck only when the dataset is starting to be pretty big.

Another interesting test is how TC performs when the dataset is much larger then the ram and you access random keys. For example 3 times larger then ram. I suspect that it starts to be much slower so that probably it's better to have as much RAM as the dataset size :)

Salvatore Sanfilippo:
I think you are right and judging from my experience IO is the biggest bottleneck in any database - - because the disk is so much slower than the memory or the CPU. Once the database begins to do heavy IO - - then it's game over for the performance.

This said, it really depends on a lot of factors and how the database is built and used. We have some million keys in our Tokyo Tyrant installation and the performance has so far been really great - even thought that the whole data set does not fit into TT's limited memory.

This said, there's no doubt that Redis is faster and a better choice if the dataset can fit into the memory and raw performance is super important.

I think that Tokyo Tyrant and Redis complement each other - - and not necessary compete against each other.

I'd think it would be cool to have a version of lightcloud that used redis for the lookup ring and tokyo for the storage ring. You can do super fast indexing with lists and sets in redis and keep the real persistent data in tokyo for a really nice combination.

This would play to the strengths of each storage system, redis would be all in memory but there would not be as much data since it just storing pointers to storage in tokyo where you keep your data that is too big to fit into memory at once.

Ezra Zygmuntowicz:
Great idea and one I actually had played around with. Currently thought I need to battletest Redis in order to encourage this usage.

The great thing about this feature thought is that one can do much faster lookups and cleanups of the lookup cloud with Redis - so the overall performance of LightCloud on millions of keys could potential be much better.

Best of both worlds ;-)

Those benchmark numbers look slow for both TT and Redis. It looks much slower than using either directly. I'm guessing the slow down is in the hashing and lookup ring?

Also I'm surprised how well the simulated list_add utilizing Lua performs against Redis' push. It could probably be sped up using _putcat if the list isn't over limit but that still requires a _get and loading the entire list into Lua.

That said I don't think it's a fair comparison to bench a marquee Redis feature against what is more Lua string operations than TT. It might be a better comparison storing a TCLIST object and using the Cabinet bindings for Lua assuming tclisttotable/tabletotclist are more efficient than de/encoding the current string based implementation you are using.

ActsAsFlinn:
Most of the time is spent in the network layer or the network code of Python. The looking up in the hash ring is peanuts compared to this - - you can do a test by running the hotshot profiler.

AFAIK TCLIST is not a supported type in Tokyo Tyrant's Lua API? Other than this, the string operations are actually pretty fast, but you are right that a benchmark against a real array structure would be quite interesting.

Hey Amir,

great to see you pick up Redis! I did it some weeks ago and already switched the tagging system on my site to use it. Although providing sets and lists in TT via Lua is good, having them out of the box with a common interface is just what makes Redis IMO importantly stand out of the key/value databases available. It requires a different way to think about data than e.g. with SQLAlchemy or RDBMS in general, but feels somewhat more natural to me as one is used to deal with such data structures as they are what a programming language usually provides itself.

I talked with ludo about some possible improvements for 'redis.py' and I'm looking forward to what you will provide (and it's nice what have provided so far).

Don't you agree with me that it should use Unicode literals all over the place and make them the default expected argument type? Would allow for Unicode keys, better Py3k compatibility and dropping of unnecessary en-/decoding checks.

Also, I found that its behaviour of trying to guess the type of data returned by Redis may be harmful, e.g. when an ident string consists of only digits (could be a valid UUID or hash sum, I think, but also a random ident) and even might have leading zeros.

FYI, you can find me on Twitter as homeworkprod.

Jochen:
Hi Jochen

I think that embracing unicode is the way to go - - especially with respect to Python3k and Python's future where all strings will be unicode.

Other than this, I also think that Redis support for lists and sets is interesting and I hope to see more of this :-)

What is the underlying store that you are using for Tyrant ?

- default (mdb)
- hash (hdb)
- btree (bdb)

I haven't yet installed LightCloud but planning to!

regards
jose

Jose:
As default it's a hash, but you can pick btree if you wish (by choosing .tcb file extension, instead of .tch).

Ok, that's what I thought!

Your comparison is not fair. You should compare Redis to TC,
using TCMDB as the master server and TCHDB or TCBDB as the slave server, to get both performance and persistance like Redis does.

jose:
Tokyo Cabinet is just a plain database, without a network layer and without replication... Tokyo Tyrant is a network adapter implemented on top of Tokyo Cabinet. I think there are some things you have misunderstood.

amix:

My wording was poor. I mean TC to include Tokyo Tyrant. What is valid is that you should compare Redis to:

TT with the master in memory (TCMDB) and the slave in disk (TCHDB and TCBDB). I confirmed this with TC author.

A different point:

TC is a big package and easy to get confused. I.e. TC provides consistent hashing but I saw somewhere that you implement it in Python. TC author is quite responsive to new features so you should provide feedback if some features don't full meet your needs.

Also, I would find it more logical to use Lua rather than Python for TC/TT integration code, given TC/TT support for Lua.

Hope this helps clarify my response!

jose:
Tokyo Tyrant's performance is by no means bad - - as it hits the disk and disk is 100.000+ slower than memory. I don't think this is an unfair benchmark - both Redis and Tokyo Tyrant have their strong sides.

amix:

Your benchmark compares apples to oranges !

TT hdb performance is good because it makes use of caching but will not be good when compared with a memory db. You are comparing a memory db with a disk hash db. TCMDB is also a hash db but entirely in memory.

Also note that TT default storage is TCMDB, which you are not using. TT has a good TCMDB example in senatus.lua

If I were TC author I would feel truly "insulted" with one of your first statements:

quote start
On my first benchmarks it seemed that Redis was 7 to 10 times faster than Tokyo Tyrant, but doing more tests I have found out that it's slightly faster.
quote end

Benchmarks lead to huge flame wars when not done carefully. I also think the multiple authors should be warned beforehand so that they can provide input to the benchmark.

You should at least try to correct the benchmark so that readers are not misled

thank you for taking into account my viewpoint (at least you didn't ban my post)

jose:
I state a lot of times that TT is disk based, while Redis isn't - - and I also state the implications this has (namely that the memory is A LOT faster than disk, so it's expected that Redis is faster).

I think this benchmark is interesting because it compares a memory database vs. a disk based database - - and the implications this has.

Currently LightCloud does not support TCMDB - - I did not have chance to compare TT in this context. You are very welcome to do a benchmark and post it here.

Very clear and useful information. Thank you for sharing it.

Bok Amire,

Good info. Jose makes a good point. When publishing benchmarks you *always* want to have people deeply familiar with each tool provide input, ensure optimal settings, etc.

I know people sometimes benchmark Lucene and Solr (two projects I contribute to) without consulting us, and then they publish numbers that misrepresent Lucene or Solr. We, developers of those tools can see that, but people not familiar with the tools cannot, so they get a bad impression.

Anyhow, what I really wanted to ask about is support for using LightCloud from a Java application. How can one use LC from a Java app?

Hvala!

Cao Otis

I agree about benchmarks - the problem thought is that this isn't a bad benchmark for Tokyo Tyrant (and I still use Tokyo Tyrant for all nodes). One must understand thought the difference between a memory only storage and a storage engine that does not have to keep the whole data set in memory.

Regarding Java:
You'll need to implement hash_ring for Java and then port the client library of LightCloud.
You can get some inspiration on the Ruby version: http://mitchellhashimoto.com/l...

Regards,
Amir

Thanks for sharing useful , informative, helpful and nice article. I found it very informative. Thanks for updating about plurk information. ccnp certification

Hey, that's wicked cool. I've heard interesting things about Redis - but I don't think I realized it was so much faster. I'll look forward to seeing how it works.

Cheers!

Claudia

______________

mirc
mirc indir
sohbet odaları

Hey, that's wicked cool. I've heard interesting things about Redis - but I don't think I realized it was so much faster. I'll look forward to seeing how it works.

Cheers!

Claudia
mirc

Post a comment
Commenting on this post has expired.
© 2000-2009 amix. Powered by Skeletonz.