Configuring Apache and CherryPy to handle a Digg effect

This little guide will give some guidance on how to optimize Apache and CherryPy so they can handle an effect of a much bigger page.

Yesterday I got around 16000 page views and 13000 unique visits on Orangoo Spell Check. Here is a graph over traffic pr. hour (it's only for the first 17 hours).

Orangoo visits

16000 page views isn't that much, but if your setup isn't configured, then it is! I can tell that Apache died 2 times, until I figured out that my configuration was bad.

The system setup

I have following shared hosting setup:

Memory: 160MB
Disk size: 6000MB
CPU: 160 Units
OS: Debian Woody
System: Xen 3

Not impressive, but it's actually ok. And it could handle an effect, so that's cool.

Running CherryPy behind Apache

It would be wise to run CherryPy behind Apache. Apache is lighting fast and well tested on huge loads. Here is a guide from the CherryPy docs:

Apache 1 or Apache 2?

I had first gone with Apache 2, but I regretted this step. Apache 2 is more complex and harder to configure. If you don't use any specific Apache 2 features, then I would advise to go with Apache 1.

Don't use pre built versions Apache, compile Apache yourself for best performance (where you only enable the modules that you actually use).

Compiling Apache 1

Here is how I configured and build it:

./configure
--enable-module=rewrite
--enable-module=proxy
--disable-module=userdir
--disable-module=auth
--disable-module=include
--disable-module=cgi
--disable-module=env
make
make install

If you use the CherryPy and Proxy trick, then you'll have to turn rewrite and proxy on. Notice that CGI is turned off.

Apache configuration

Basically you only need to adjust some things to get Apache configured properly.

Here are my main optimizations to the httpd.conf:

MaxKeepAliveRequests 0
KeepAliveTimeout 15
MinSpareServers 15
MaxSpareServers 50
StartServers 15
MaxClients 256
MaxRequestsPerChild 0

You can read more about those parameters in two excellent articles on Apache configuration:

CherryPy configuration

This is really easy. In your configuration set following:

server.thread_pool: 100
server.socket_queue_size: 30

Basically the server is going to start 100 threads. Queue size means that if all threads are busy then CherryPy will queue up to 30 requests.

Monitoring your CherryPy server

How do you monitor your CherryPy server? Don't sleep, that's it!

Nah, you use supervisor to monitor your CherryPy server. It's super easy to use, built in Python, but has very little documentation. Luckily Titus Brown has written an excellent guide to get you started:

Notice: Supervisor can also be used to monitor any process (Apache, MySQL etc.)

Unite your static files

I forgot a important part: Unite your JavaScript and CSS files into two large files. My Apache server got 300.000 requests that day (even if I had united all my JS+CSS).

I had 8 static files which I concatenated to 2. On 15000 visits that's around 100000 requests saved. Here is a little Python script that can do the dirty work for you:

import os

def minifyAndUnite(files, output_file):
  full_text = []
  for f in files:
    full_text.extend([l.lstrip() for l in open(f, "r").readlines()])
  open(output_file, "w").write("".join(full_text))

#What files?
css = ['static/main.css', 'googiespell/googiespell.css',
   'greybox/greybox.css']
js = ['googiespell/AmiJS.js', 'googiespell/cookiesupport.js',
   'googiespell/googiespell.js', 'greybox/greybox.js']

#Get full path
cwd = os.getcwd()
full_path_css = ["%s/%s" % (cwd, fp) for fp in css]
full_path_js = ["%s/%s" % (cwd, fp) for fp in js]

#Minify and store
minifyAndUnite(full_path_css, "%s/static/css_generated.css" % cwd)
minifyAndUnite(full_path_js, "%s/static/js_generated.js" % cwd)

Know your resources (update)

I found out that my configuration of both Apache and CherryPy was too greedy. I excepted too much of so little CPU and memory. I created 100 CherryPy threads, but this was way too optimistic. The problem becomes if they all get active at the same time - can the server handle the load? My couldn't. It's better to save resources and have fewer threads.

Best way to solve this is testing and calculations. Find out how expensive one request is and how many you can handle concurrently. You need to do this both for Apache, CherryPy and other things that they might use (like Aspell...)

My current configuration

Apache:

MaxKeepAliveRequests 0
KeepAliveTimeout 15
MinSpareServers 20
MaxSpareServers 40
StartServers 25
MaxClients 256
MaxRequestsPerChild 0

CherryPy:

server.thread_pool: 40
server.socket_queue_size: 15
Code · Orangoo 18. Apr 2006
10 comments so far

Overall nice set of tips!

I'm using Debian Sarge, and it's pretty easy to turn modules on and off via symlinks in /etc/apache2.

By the way, if your static files are being served directly from Apache and you're using HTTP/1.1, saving those 100,000 requests is probably less of a win than you might think. There will undoubtedly be a little less load, but I don't think it would make a huge difference to the server. Apache serves up files quickly and HTTP/1.1 will reuse the same connection for the files that are needed.

Thanks Kevin.

Regarding the HTTP/1.1 you are absolutely right :)
I forgot about that.

Nice writeup. Are you using #unixshell for your Xen based hosting? I am looking into moving over to them, so if you are using them, I'd be interested to hear your opinions about performance/service/etc.

Hey Christian

I can highly recommend unixshell. Super cheap and super performance. I just upgraded my server :) But of course, you pretty much got to setup everything yourself - which IMO is great :) But it can be a pain in the ass sometimes.

Another webhost that I can recommend (Christian already knows this service ;]): python-hosting for super Python hosting. Their support is also superb.

PS: I will update this guide. I learned the hard way that this configuration is too greedy.

Cool news about #unixshell. What plan did you start out at? I am considering "The 96". I just need to host email for my wife and me and host my *.dowski.com websites. That might be pushing 96MB of RAM, but I need to keep it cheap :-)

Hi again Christian

I started with the 160. I think 96 is pretty fine for your use.

thanks for your tips, great!

Nice write-up - a good contribution to the community. Thanks!

Hi, I am having some troubles with supervisor. Maybe someone can help me out. Restarting my site works, but only once. Here is the logtail:

2006-08-31 08:46:05,568 INFO (Re)starting serpia_org
2006-08-31 08:46:05,627 INFO child with pid 3221 was reaped
2006-08-31 08:46:05,628 CRITICAL serpia_org: restarting too frequently; quit
2006-08-31 08:46:05,629 INFO pid 3221: exit status 0; exiting now
2006-08-31 08:46:05,630 INFO pid 3221: exit status 0; exiting now

Anyone knows how to fix this? Thanks, Dimitri

@Dimitri

I am sure Amix knows and he can help you out.

Post a comment
Commenting on this post has expired.
© 2000-2009 amix. Powered by Skeletonz.