Configuring Apache and CherryPy to handle a Digg effect

This little guide will give some guidance on how to optimize Apache and CherryPy so they can handle an effect of a much bigger page.

Yesterday I got around 16000 page views and 13000 unique visits on Orangoo Spell Check. Here is a graph over traffic pr. hour (it's only for the first 17 hours).

Orangoo visits

16000 page views isn't that much, but if your setup isn't configured, then it is! I can tell that Apache died 2 times, until I figured out that my configuration was bad.

The system setup

I have following shared hosting setup:

Memory: 160MB
Disk size: 6000MB
CPU: 160 Units
OS: Debian Woody
System: Xen 3

Not impressive, but it's actually ok. And it could handle an effect, so that's cool.

Running CherryPy behind Apache

It would be wise to run CherryPy behind Apache. Apache is lighting fast and well tested on huge loads. Here is a guide from the CherryPy docs:

Apache 1 or Apache 2?

I had first gone with Apache 2, but I regretted this step. Apache 2 is more complex and harder to configure. If you don't use any specific Apache 2 features, then I would advise to go with Apache 1.

Don't use pre built versions Apache, compile Apache yourself for best performance (where you only enable the modules that you actually use).

Compiling Apache 1

Here is how I configured and build it:

./configure
--enable-module=rewrite
--enable-module=proxy
--disable-module=userdir
--disable-module=auth
--disable-module=include
--disable-module=cgi
--disable-module=env
make
make install

If you use the CherryPy and Proxy trick, then you'll have to turn rewrite and proxy on. Notice that CGI is turned off.

Apache configuration

Basically you only need to adjust some things to get Apache configured properly.

Here are my main optimizations to the httpd.conf:

MaxKeepAliveRequests 0
KeepAliveTimeout 15
MinSpareServers 15
MaxSpareServers 50
StartServers 15
MaxClients 256
MaxRequestsPerChild 0

You can read more about those parameters in two excellent articles on Apache configuration:

CherryPy configuration

This is really easy. In your configuration set following:

server.thread_pool: 100
server.socket_queue_size: 30

Basically the server is going to start 100 threads. Queue size means that if all threads are busy then CherryPy will queue up to 30 requests.

Monitoring your CherryPy server

How do you monitor your CherryPy server? Don't sleep, that's it!

Nah, you use supervisor to monitor your CherryPy server. It's super easy to use, built in Python, but has very little documentation. Luckily Titus Brown has written an excellent guide to get you started:

Notice: Supervisor can also be used to monitor any process (Apache, MySQL etc.)

Unite your static files

I forgot a important part: Unite your JavaScript and CSS files into two large files. My Apache server got 300.000 requests that day (even if I had united all my JS+CSS).

I had 8 static files which I concatenated to 2. On 15000 visits that's around 100000 requests saved. Here is a little Python script that can do the dirty work for you:

import os

def minifyAndUnite(files, output_file):
  full_text = []
  for f in files:
    full_text.extend([l.lstrip() for l in open(f, "r").readlines()])
  open(output_file, "w").write("".join(full_text))

#What files?
css = ['static/main.css', 'googiespell/googiespell.css',
   'greybox/greybox.css']
js = ['googiespell/AmiJS.js', 'googiespell/cookiesupport.js',
   'googiespell/googiespell.js', 'greybox/greybox.js']

#Get full path
cwd = os.getcwd()
full_path_css = ["%s/%s" % (cwd, fp) for fp in css]
full_path_js = ["%s/%s" % (cwd, fp) for fp in js]

#Minify and store
minifyAndUnite(full_path_css, "%s/static/css_generated.css" % cwd)
minifyAndUnite(full_path_js, "%s/static/js_generated.js" % cwd)

Know your resources (update)

I found out that my configuration of both Apache and CherryPy was too greedy. I excepted too much of so little CPU and memory. I created 100 CherryPy threads, but this was way too optimistic. The problem becomes if they all get active at the same time - can the server handle the load? My couldn't. It's better to save resources and have fewer threads.

Best way to solve this is testing and calculations. Find out how expensive one request is and how many you can handle concurrently. You need to do this both for Apache, CherryPy and other things that they might use (like Aspell...)

My current configuration

Apache:

MaxKeepAliveRequests 0
KeepAliveTimeout 15
MinSpareServers 20
MaxSpareServers 40
StartServers 25
MaxClients 256
MaxRequestsPerChild 0

CherryPy:

server.thread_pool: 40
server.socket_queue_size: 15
Code · Orangoo 18. Apr 2006
© Amir Salihefendic. Powered by Skeletonz.