Do complex problems demand complex solutions?

Do complex problems demand complex solutions? Personally, I don't think so. I think people are blinded by complex problems, in the sense that they can only see complex solutions. There are also many cases where complex problems are solved with complex solutions and simple problems are solved with complex solutions. Let's look at some examples of simple solutions for complex problems.

Map-reduce

Do you need to solve something in parallel? Then using Map-Reduce is a pretty good idea. And playing around with map-reduce isn't that hard, here is a barebone implementation in Erlang:

pmap(F, L) -> 
    S = self(),
    Pids = map(fun(I) -> 
		       spawn(fun() -> do_f(S, F, I) end)
	       end, L),
    gather(Pids).

gather([H|T]) ->
    receive
	{H, Ret} -> [Ret|gather(T)]
    end;
gather([]) ->
    [].

do_f(Parent, F, I) ->					    
    Parent ! {self(), (catch F(I))}.

Spell checking

Google's director of reasearch, Peter Norvig, has created an amazing spell checker in only 21 lines of Python code. Given enough data, his spell checker beats spell checkers like Aspell - - since he solution uses statistics, where Aspell and the like use dictionary lookups. They have used the same basic idea for Google Translate, which is one of the best translator today.

His spell checker implementation:

import re, collections

def words(text): return re.findall('[a-z]+', text.lower()) 

def train(features):
    model = collections.defaultdict(lambda: 1)
    for f in features:
        model[f] += 1
    return model

NWORDS = train(words(file('big.txt').read()))

alphabet = 'abcdefghijklmnopqrstuvwxyz'

def edits1(word):
    n = len(word)
    return set([word[0:i]+word[i+1:] for i in range(n)] +                     
               [word[0:i]+word[i+1]+word[i]+word[i+2:] for i in range(n-1)] +
               [word[0:i]+c+word[i+1:] for i in range(n) for c in alphabet] +
               [word[0:i]+c+word[i:] for i in range(n+1) for c in alphabet])

def known_edits2(word):
    return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in NWORDS)

def known(words): return set(w for w in words if w in NWORDS)

def correct(word):
    candidates = known([word]) or known(edits1(word)) or known_edits2(word) or [word]
    return max(candidates, key=lambda w: NWORDS[w])

QuickSort

QuickSort is an sorting algorithm, it's worst case running time is O(n^2) [that's really bad]. But the average running time is O(n*log(n)) and in most cases it's a lot faster than other O(n*log(n)) algorithms, hence it's used as the standard sorting algorithm in most sorting libraries. QuickSort implemented in Haskell:

qsort []     = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

It's a simple solution with a dumb worst case running time that in most cases beats all the other (and more complex) algorithms.

Memcached

Memcached has changed the way we view and use caching. It's simple to use, it's simple to scale and it's a simple solution for a "complex" problem. Some time ago I reviewed memcached's implementation, check it out for more info.

Distributed hash table (DHT)

I am in the process of developing a distributed hash table. I have looked at a _lot_ of solutions for this problem, the largest of them are following:

  • Chord: P2P network that implements a DHT
  • Amazon Dynamo: Amazon's DHT that they use for Amazon S3 and storing sessions, customer carts, product recommendations etc.
  • Scalaris: Erlang DHT that promises scalability and performance
  • CouchDB: Distributed document database that can be used as a DHT
  • Cassandra, Hbase, Bigtable: Much more complex systems, but can be used as DHT's
  • ...

The problem thought is that most of the solutions are _really_ complex and I don't think one needs a complex solution to solve the DHT problem.

Here is a get and set method of my DHT:

def get(key):
    """Lookup's the storage node in the
    lookup ring and return's the stored value"""
    storage_node = locate_node(key)
    if storage_node:
        return storage_node.get(key)
    return None

def delete(key):
    """Lookup's the storage node in the lookup ring
    and deletes the key from lookup ring and storage node"""
    storage_node = locate_node(key)

    for i, lookup_node in enumerate(lookup_ring.iterate_nodes(key)):
        if i > 2:
            break
        lookup_node.delete(key)

    if storage_node:
        storage_node.delete(key)
    return True

I look at my solution and think: damn, this is really simple, why didn't the others think of it?

Code · Python 26. Nov 2008
© Amir Salihefendic. Powered by Skeletonz.