I am a huge fan of
memcached and we use it a lot on
Plurk.
Why to like memcached:
I have looked lightly into the internals of memcached to find out how it does its magic and what it makes it such an amazing choice for caching.
External libraries used
memcached uses these external libraries:
How memcached manages memory
Memcached does not use malloc/free for memory management, but a manual memory manager (implemented in slabs.c). The details and motivation of slabs are outlined in doc/memory_management.txt:
The primary goal of the slabs subsystem in memcached was to eliminate
memory fragmentation issues totally by using fixed-size memory chunks
coming from a few predetermined size classes (early versions of
memcached relied on malloc()'s handling of fragmentation which proved
woefully inadequate for our purposes). For instance, suppose
we decide at the outset that the list of possible sizes is: 64 bytes,
128 bytes, 256 bytes, etc. - doubling all the way up to 1Mb. For each
size class in this list (each possible size) we maintain a list of free
chunks of this size. Whenever a request comes for a particular size,
it is rounded up to the closest size class and a free chunk is taken
from that size class. In the above example, if you request from the
slabs subsystem 100 bytes of memory, you'll actually get a chunk 128
bytes worth, from the 128-bytes size class. If there are no free chunks
of the needed size at the moment, there are two ways to get one: 1) free
an existing chunk in the same size class, using LRU queues to free the
least needed objects; 2) get more memory from the system, which we
currently always do in _slabs_ of 1Mb each; we malloc() a slab, divide
it to chunks of the needed size, and use them.
The tradeoffs of memcached's memory management are also outlined:
The tradeoff is between memory fragmentation and memory utilisation. In
the scheme we're now using, we have zero fragmentation, but a relatively
high percentage of memory is wasted. The most efficient way to reduce
the waste is to use a list of size classes that closely matches (if
that's at all possible) common sizes of objects that the clients
of this particular installation of memcached are likely to store.
For example, if your installation is going to store hundreds of
thousands of objects of the size exactly 120 bytes, you'd be much better
off changing, in the "naive" list of sizes outlined above, the class
of 128 bytes to something a bit higher (because the overhead of
storing an item, while not large, will push those 120-bytes objects over
128 bytes of storage internally, and will require using 256 bytes for
each of them in the naive scheme, forcing you to waste almost 50% of
memory). Such tinkering with the list of size classes is not currently
possible with memcached, but enabling it is one of the immediate goals.
That is, they have tried to optimize memory fragmentation by implementing their own memory manager.
How memcachd expires cached items
Like you may know memcached supports cache expiration, for example here is a common usage of memcached in Python:
def get_data():
data = memcache().get('my_data')
if data:
return data
data = 'some data'
memcache().set('my_data', data, 900)
return data
Memcached will automatically expire 'my_data' cache after 900 seconds. And this automatic expiring makes memcached a bit more complex.
The expiring works like this:
-
When an item is requested memcached checks the expiration time to see if the item is expired before returning it to the client
[implemented in items.c/do_item_get_notedeleted]
-
When adding an item to the cache, memcached checks if the cache is full, if it is, it will look for expired items to replace. If no expired items are found, it will replace the least used items in the cache
[implemented in items.c/do_item_alloc]
Note that memcached expires cached items lazily - something that surprised me.
The virtues of memcached's design
I think memcached's design is inspiring, I particularity like following things:
-
It's implemented in C, making it really fast
-
In order to optimize memory they have implemented their own memory management - pretty good choice as it's memcached biggest bottleneck
-
It uses non-blocking IO which makes it a really amazing choice when combined with Nginx
-
Their expiring mechanism is simple and does not waste CPU time
I simply think they have attacked this the right way and memcached is an amazing product without much overhead or things that you don't need.