Web application optimization

The last couple of days have been the days of optimization! I have turned Skeletonz into a system that could handle a Slashdot effect! I will in this little article show you how I made my CMS application 14 times faster.

My testing is done for personal use - I am not biased. I have a great interest in using the best tools around, that's why I did this testing. I have only done graphing for the interesting stuff. In the bottom you will find a text file with all my tests.

Before - After

I took some benchmarks before I started. I took some after I was finished. Check out the improvements! The applicaion is actually over 100 times faster (if one compares caching off vs. caching on):

Before - After

Before - After

Explaining the stuff you will see in the graphs

I tested using the Apache ab tool. In the graphs you will see:

  • The requests that failed / Total requests done to server @ Number of concurrent users
  • req. pr. sec. is the amount requests that the server could handle per second. The bigger this is, the better!
  • Application caching is internal caching (when it is turned on, content is only generated once)

You run the ab tool by doing ab -n 2000 -c 20 http://localhost:8080/ in a terminal.

The setup

These tests are done on Mac OS X 10.4 using a PowerMac G5 2.0 dual with 2 GB ram. Using Python 2.4.

The site that is served has:

  • 4 KB HTML file
  • 5 images around 15 KB
  • 4 KB CSS
  • 20 KB JS

1. step: Use benchmarking

First I started using the ab tool!

You must remember that benchmarking is not everything :), but it gives you a pretty good idea if something is wrong - or if you have done an improvement. It can even find bugs for you - under my testing I spotted 3 big bugs!

2. step: Know where to optimize

Switching frameworks will give you a little improvement. That said, I switched from CherryPy to RhubarbTart. It gave a 2-3 times improvement. I lost some features (since CherryPy is very feature rich compared to RhubarbTart), and got some speed.

Notice, this 2-3 times improvement meant VERY little when caching was turned off - it only meant something when application caching was turned on. Look in the my testing data file.

The worst mistake you can make is to switch to another framework because you want an improvement... This wont give you an amazing improvement, since the things that usually take time in a web application is:

  • Database queries
  • Template filling
  • Algorithms

Optimizing those leads to great improvements. To switch to another framework because you like it better is cool. That was the reason for the RhubarbTart switch.

Anyway, I got this 2-3 times boost by doing some extra work for RhubarbTart and the Myghty session middleware. I did following:

  • Caching for the URL to object mapper that RhubarbTart uses. This gave a huge boost. (around 100 req. pr. sec extra)
  • Lazy evaluation of sessions, i.e. session are only loaded if you actually use them. This step also made a huge boost. (around 100 req. pr. sec extra)

3. step: Get dirty with the tools

I use the Cheetah template system. And I used Template(file=..) to generate my template. This takes time since the template gets compiled to Python code on every request - i.e. this takes time.

To fix this, make something that caches the compiled template - i.e. the trick is to make sure that the template is only compiled once! This gives a somewhat big improvement (maybe 5 times... depends on the context)!

My code hack: template.py code

4. step: Cache the expensive stuff

Here is a more general rule you can follow:

  • Cache the expensive stuff! Find the stuff that takes a lot of times and cache that so you are sure it only runs 1 time.

This can be tricky, i.e. you must be sure to update the cache if the data is updated. This step gives major improvements, but can be tricky to make.

In Python a simple cache can be made by using a dictionary. My Cheetah template uses this trick, but does not check for updated templates, but it could :) It's a design decision.

Anyway, I did following caching:

  • I found out that the parsing of special Skeletonz syntax takes a lot of time, so I created a simple caching mechanism
  • I found out that the filling the templates takes a lot of time, so I cached it :]

I got up to maybe 10 times improvement by following this caching step - i.e. caching stuff gives a lot of performance!

5. Try out other server setups

It can be cool to check out other server setups since they may give a big performance boost. I tried following WSGI application deployment setups - using FLUP:

  • FastCGI with lighttpd - could not get it to work
  • SCGI with lighttpd
  • mod_python - could not get it to work
  • Handling static content with Apache

SCGI

My SCGI did not give any improvement... It seems to be overrated. It was also very complicated to get this setup up and running. The FastCGI is even more complex.

FLUP

Apache handling static content

I used a simple .htaccess ModRewrite proxy setting to handle static files, this didn't give much... :

Apache

What server setup to choose?

SCGI and Apache handling of static files seems to be overrated. They don't seem to give an improvement. The things they do is making your setup more complex...

I can also recommend using CherryPy's WSGI server. It's very fast.

Bugs found

MySQL/MySQLdb BUG!

I found a nasty MySQL/MySQLdb bug. MySQL went internally "down" when doing a lot of benchmarking. Postgres didn't fail. Here is a graph that shows that speed of MySQL is a bit faster, but all 2000 request FAIL.

Database

Paste HTTP server bug

Paste is a VERY slick mini framework. It has a WSGI server that has a bug. I tested it vs. CherryPy's WSGI server. This is/will be fixed.

Cherry vs Paste

Cherry vs Paste

Pooling bug

I also found a bug early in the process with my database connection pooling... More concurrent users resulted in database connection pooling disaster...

Test data

I have done a lot of testing. Here is the test data including some of them.

More optimizations

Underway I did a JavaScript and CSS optimization. All the internal style sheets and JavaScript files get automatically transformed into two files (which are stripped for new lines, tabs and that jazz...) This means that the server makes 2 requests, one for a CSS file and one for the JS file.

Benchmarks · Code · Design · Python · Skeletonz 15. Mar 2006
© Amir Salihefendic. Powered by Skeletonz.