memcached: Benchmark of 4 Python libraries

Memcached speed

Optimizations, don't we just love them! Unfortunately most micro optimizations aren't worth doing. The optimizations that are worth doing are those that affect everything... And if you use memcached, then memcached affects everything ;-) In this blog post I present a benchmark of 4 most popular Python memcached libraries (one of them pure Python, the 3 others C wrappers).

As my benchmark shows, there are lots of gains, basically you can speed up your memcached operations by 2x times - which is REALLY hard to do with any other optimization.

There are currently 4 Python memcached libraries and there aren't any good benchmarks of these, so I have set a goal to benchmark these. The candidates:

Observations and changes:

  • I have applied a patch to all C libraries to make them run on Snow Leopard (edited in setup.py and removed rpath, which isn't supported by Leopard's linker). Every library is compiled for x86_64 (i.e. 64 bit).
  • I have patched pylibmc and python-libmemcached with support for serializing booleans
  • I have also fixed a 64bit bug in pylibmc, it sent Py_ssize_t instead of size_t to mget
  • I will share these patches after I have done some more testing

Like every benchmark this benchmark should be taken with a grain of salt.

Benchmark program

This is a modified benchmark.py from python-libmemcached. It runs 10000 iterations of each command:

Benchmarking pylibmc_optimized...
test_set: 0.743668 seconds
test_set_get: 1.289444 seconds
test_random_get: 2.336701 seconds
test_set_same: 0.785587 seconds
test_set_big_object (100 objects): 0.014704 seconds
test_set_get_big_object (100 objects): 0.032860 seconds
test_set_big_string (100 objects): 0.009394 seconds
test_set_get_big_string (100 objects): 0.021033 seconds
test_get: 0.606791 seconds
test_get_big_object (100 objects): 0.012321 seconds
test_get_multi: 0.019260 seconds
Total_time is 5.871763
---
Benchmarking pylibmc...
test_set: 0.744818 seconds
test_set_get: 1.386534 seconds
test_random_get: 2.475867 seconds
test_set_same: 0.775607 seconds
test_set_big_object (100 objects): 0.013254 seconds
test_set_get_big_object (100 objects): 0.031905 seconds
test_set_big_string (100 objects): 0.009887 seconds
test_set_get_big_string (100 objects): 0.021890 seconds
test_get: 0.644991 seconds
test_get_big_object (100 objects): 0.011983 seconds
test_get_multi: 0.018810 seconds
Total_time is 6.135546
---
Benchmarking cmemcache...
test_set: 0.898636 seconds
test_set_get: 1.814076 seconds
test_random_get: 3.197659 seconds
test_set_same: 0.928649 seconds
test_set_big_object (100 objects): 0.014427 seconds
test_set_get_big_object (100 objects): 0.031279 seconds
test_set_big_string (100 objects): 0.010986 seconds
test_set_get_big_string (100 objects): 0.025449 seconds
test_get: 0.854429 seconds
test_get_big_object (100 objects): 0.013078 seconds
test_get_multi: 0.463271 seconds
Total_time is 8.251940
---
Benchmarking python-libmemcached...
test_set: 0.740007 seconds
test_set_get: 1.336759 seconds
test_random_get: 2.363844 seconds
test_set_same: 0.736221 seconds
test_set_big_object (100 objects): 0.013195 seconds
test_set_get_big_object (100 objects): 0.031755 seconds
test_set_big_string (100 objects): 0.010874 seconds
test_set_get_big_string (100 objects): 0.020221 seconds
test_get: 0.622201 seconds
test_get_big_object (100 objects): 0.011825 seconds
test_get_multi: 0.015463 seconds
Total_time is 5.902364
---
Benchmarking memcache...
test_set: 1.276277 seconds
test_set_get: 2.596438 seconds
test_random_get: 4.869392 seconds
test_set_same: 1.351409 seconds
test_set_big_object (100 objects): 0.057328 seconds
test_set_get_big_object (100 objects): 0.091957 seconds
test_set_big_string (100 objects): 0.018521 seconds
test_set_get_big_string (100 objects): 0.038375 seconds
test_get: 1.303581 seconds
test_get_big_object (100 objects): 0.028765 seconds
test_get_multi: 0.380600 seconds
Total_time is 12.012643

pylibmc seems to be fastest, especially when applied with tcp_nodelay=1 behavior. Generally, the C libraries seem to be around 2 times faster than the pure Python implementation.

Test in a threaded environment

This is a test of these libraries in a threaded environment (basically a WSGI application that does 4 GET operations). In order for this to work the libraries need to be encapsulated in a threading.local.

The test does 1 warmup request and afterwards 1000 requests:

#python-memcache
Requests per second:    97.88 [#/sec] (mean)
Time per request:       10.217 [ms] (mean)

PID    COMMAND      %CPU TIME     #TH  #WQ  #PORTS #MREG RPRVT  RSHRD  RSIZE
50077  Python       0.0  00:09.15 9    0    63     237   31M    244K   34M 

#python-libmemcached
Requests per second:    82.05 [#/sec] (mean)
Time per request:       12.188 [ms] (mean)

PID    COMMAND      %CPU TIME     #TH  #WQ  #PORTS #MREG RPRVT  RSHRD  RSIZE
50101  Python       0.0  00:10.62 10   1    72     270   36M    244K   39M 

#cmemcache
Requests per second:    106.11 [#/sec] (mean)
Time per request:       9.425 [ms] (mean)

PID    COMMAND      %CPU TIME     #TH  #WQ  #PORTS #MREG RPRVT  RSHRD  RSIZE
50121  Python       0.0  00:08.58 9    0    62     237   31M    244K   34M 

#pylibmc_optimized
Requests per second:    108.09 [#/sec] (mean)
Time per request:       9.251 [ms] (mean)

PID    COMMAND      %CPU TIME     #TH  #WQ  #PORTS #MREG RPRVT  RSHRD  RSIZE
50043  Python       0.0  00:08.48 9    0    62     243   32M    244K   34M 

I have no clue why python-libmemcache performs so poorly in this test. This also shows that benchmarks should be used as indicators and not the truth ;-)

Conclusion

pylibmc seem to be most promising library, probably because it's hand-coded and because it's based upon libmemcached. python-libmemcached seems to be promising on a simple benchmark, but seems to be lacking in performance (and memory usage!) in a threaded environment (this could be related to PyRex, but I am unsure).

Looking at CPU and memory usage python-libmemcached seems to be taking most, while pylibmc uses least CPU and cmemcache least memory.

So in general I would recommend pylibmc or cmemcache - this said, it's best that you do your own benchmarks based on your architecture(s) and your usage patterns.

[Update] Patch for pylibmc

Syntax highlighted or Raw.

It patches following:

  • Compiles for 64 bit under Snow Leopard
  • Uses size_t instead of Py_ssize_t. At the beginning I had segmentation faults and I suspect it was caused by either size_t size or the problem with 32bit vs. 64bit. python-libmemcached had a similar issue with size_t. I am not a C programmer so I don't really know if this is a big issue.
  • Adds support for storing and retrieving boolean values
  • Adds support for storing an empty string (""), this resulted in an error before

[Update] Patch for python_libmemcached

Patch adds following:

  • Support for storing and retrieving boolean values
  • Compilation under Snow Leopard
3. Sep 2009 Benchmarks · Code · Python
© Amir Salihefendic