Consistent hashing: Improving distribution

Testing hash_ring's distribution of keys I found out that the distribution wasn't uniform. A big problem, since it would overload some nodes in the system.

I then took a look at how libketama implements hashing and found out that:

  • they add lots of virtual nodes. For example, adding 3 servers would add 360 points in the ring (120 pr. server). My implementation only replicated one node 4 times
  • they do binary operations to the md5 checksum which reduces the number of bits to around 10 (instead of around 40)

These two things messed up hash_ring's distribution. A simple test that tests random distribution of 100.000 words confirms this:

from hash_ring import HashRing

memcache_servers = ['192.168.0.246:11212',
                    '192.168.0.247:11212',
                    '192.168.0.249:11212']

ring = HashRing(memcache_servers)
iterations = 100000

def genCode(length=10):
    chars = string.letters + string.digits
    return ''.join([random.choice(chars) for i in range(length)])

def random_distribution():
    counts = {}
    for s in memcache_servers:
        counts[s] = 0

    for i in range(0, iterations):
        word = genCode(10)
        counts[ring.get_node(word)] += 1

    for k in counts:
        print '%s: %s' % (k, counts[k])

random_distribution()

Running it results in following distribution:

amixs$ python tests/random_distribution.py 
192.168.0.247:11212: 6868
192.168.0.246:11212: 20657
192.168.0.249:11212: 72475

Creating the ring with a lot more virtual points improves the distribution:

ring = HashRing(memcache_servers, replicas=120)

Re-running it yields:

amixs$ python tests/random_distribution.py 
192.168.0.247:11212: 35797
192.168.0.246:11212: 32258
192.168.0.249:11212: 31945

Much better. Performance gets worse as one of the lookups is O(n), where n is number of nodes in the ring. A fix thought is to use binary search from Python's bisect module, this eliminates the O(n) lookup and replaces it with a O(log(n)) lookup. On a 900 node ring the binary search improvement improves the speed from 40 seconds to only 5 seconds (testing again with 100.000 random words) - which is pretty nice speedup.

I have gone a bit longer and implemented the way libketama creates the ring. This makes hash_ring compatible with libketama, which is nice, since libketama is ported to Java, Erlang, Perl... This also means that one can assign node weights - this is pretty practical if you got a memcached server with 2GB of memory, while the others only have 1GB:

memcache_servers = ['192.168.0.246:11212',
                    '192.168.0.247:11212',
                    '192.168.0.249:11212']
weights = {
    '192.168.0.246:11212': 1,
    '192.168.0.247:11212': 2,
    '192.168.0.249:11212': 1
}

ring = HashRing(memcache_servers, weights)

Running the weighted ring yields following result:

amixs$ python tests/random_distribution.py 
192.168.0.247:11212: 50806
192.168.0.246:11212: 22244
192.168.0.249:11212: 26950

How adding/removing of servers affects distribution

The main reason we have consistent hashing is so removing / adding nodes does not re-assign all the keys, and this is crucial to test as well. I created another test that uses Peter Norvig's World's Longest Palindrome Sentence as the input data to test how the words get re-distributed as we add extra nodes to the system. Output of the test is following:

Distribution of server_sets_3::
192.168.0.247:11212: 4609
192.168.0.248:11212: 3961
192.168.0.246:11212: 3827

Distribution of server_sets_5::
192.168.0.247:11212: 2612
192.168.0.248:11212: 2472
192.168.0.250:11212: 2254
192.168.0.246:11212: 2525
192.168.0.249:11212: 2534

Testing how alike server_sets_3 are to the sets in server_sets_5::
192.168.0.247:11212: 4609 in init_set
192.168.0.247:11212: 2612 in new_set
192.168.0.247:11212: 2612 in both init_set and new_set

192.168.0.248:11212: 3961 in init_set
192.168.0.248:11212: 2472 in new_set
192.168.0.248:11212: 2472 in both init_set and new_set

192.168.0.246:11212: 3827 in init_set
192.168.0.246:11212: 2525 in new_set
192.168.0.246:11212: 2525 in both init_set and new_set

The distribution looks pretty good as we keep adding nodes! Testing code for this is following:

from hash_ring import HashRing

text = open('tests/palindromes.txt').read()
text = text.replace('\n', '').replace('a ', '').replace('an ', '')
palindromes = [t.strip() for t in text.split(',')]

#--- Helper functions ----------------------------------------------
def create_sets(servers):
    server_sets = {}
    for s in servers:
        server_sets[s] = set()

    ring = HashRing(servers)
    for word in palindromes:
        node = ring.get_node(word)
        server_sets[node].add(word)

    return server_sets

def print_distributions(name, server_sets):
    print '\nDistribution of %s::' % name
    for s in server_sets:
        print '%s: %s' % (s, len(server_sets[s]))

def print_set_info(servers_init, servers_new):
    for init_server in servers_init:
        init_set = servers_init[init_server]
        new_set = servers_new[init_server]

        print ''
        print '%s: %s in init_set' %\
                (init_server, len(init_set))
        print '%s: %s in new_set' %\
                (init_server, len(new_set))
        print '%s: %s in both init_set and new_set' %\
                (init_server, len(init_set.intersection(new_set)))

#--- Testing ----------------------------------------------
init_servers = ['192.168.0.246:11212',
                '192.168.0.247:11212',
                '192.168.0.248:11212']
server_sets_3 = create_sets(init_servers)

print_distributions('server_sets_3', server_sets_3)

extra_servers = ['192.168.0.246:11212',
                 '192.168.0.247:11212',
                 '192.168.0.248:11212',
                 '192.168.0.249:11212',
                 '192.168.0.250:11212']
server_sets_5 = create_sets(extra_servers)

print_distributions('server_sets_5', server_sets_5)

print_set_info(server_sets_3, server_sets_5)

The new version of hash_ring is out with these improvements.

23. Nov 2008 Code · Python
© Amir Salihefendic