The Unladen Swallow strikes back

Swallow

This is a follow up benchmark on Python unladen-swallow at least 33% slower than Python 2.5.1. My last benchmark was a quick one and I wanted to create one that's more detailed and that benchmarks Q2 and Python 2.6 on some real load.

A little teaser is that Unladen Swallow Q2 seems to be making great progress :-)

About the hardware and the Python versions tested

These tests are run on a MacBook with 2GHz Intel Core Duo 2 with 4GB of RAM.

Following Python versions are tested:

They are all compiled with GCC 4.0.1 (Apple Inc. build 5465).

About the tests

Following tests are run on the above Python versions:

A warm up phase is used where 1000 requests are made, this should make it fair for Q2 version of unladen-swallow that supports JIT.

pystone benchmarks

python 2.5.2

Pystone(1.1) time for 50000 passes = 1.15
This machine benchmarks at 43478.3 pystones/second

python 2.6.2

Pystone(1.1) time for 50000 passes = 1.07391
This machine benchmarks at 46559 pystones/second

Unladen Swallow Q1

Pystone(1.1) time for 50000 passes = 1.02964
This machine benchmarks at 48560.9 pystones/second

Unladen Swallow Q2

Pystone(1.1) time for 50000 passes = 1.45034
This machine benchmarks at 34474.7 pystones/second

And the winner is?

The winner seems to be Unladen Swallow Q1. It should be noted that Python 2.6 seems to be faster than Python 2.5. The probable reason why Unladen Swallow Q2 is slower is because JIT has not yet kicked in, so this benchmark is useless for testing the real performance of Unladen Swallow Q2.

Profile rendering, cached

This request makes a lot of calls to memcached. It should be noted that Python 2.6.2 crashes with a Bus error when performing this benchmark... I was not able to debug why this bus error occurs.

Python 2.5.2

Concurrency Level:      10
Time taken for tests:   38.729 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      25109000 bytes
HTML transferred:       24972000 bytes
Requests per second:    25.82 [#/sec] (mean)
Time per request:       387.287 [ms] (mean)
Time per request:       38.729 [ms] (mean, across all concurrent requests)
Transfer rate:          633.13 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0  137 1390.8      0   19026
Processing:    11  179 781.8    111   19024
Waiting:       11  178 781.8    110   19024
Total:         11  316 1608.2    113   19482

Unladen Swallow Q1

Concurrency Level:      10
Time taken for tests:   31.635 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      24763000 bytes
HTML transferred:       24626000 bytes
Requests per second:    31.61 [#/sec] (mean)
Time per request:       316.347 [ms] (mean)
Time per request:       31.635 [ms] (mean, across all concurrent requests)
Transfer rate:          764.43 [Kbytes/sec] received

Unladen Swallow Q2

Concurrency Level:      10
Time taken for tests:   14.973 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      24692888 bytes
HTML transferred:       24555000 bytes
Requests per second:    66.79 [#/sec] (mean)
Time per request:       149.731 [ms] (mean)
Time per request:       14.973 [ms] (mean, across all concurrent requests)
Transfer rate:          1610.49 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0      10
Processing:    46  149 100.4    131    1239
Waiting:       46  148 100.4    130    1239
Total:         46  149 100.4    131    1240

The above benchmark only performed well on the first 1000 requests, afterwards the performance degraded to about 20 req. pr. second. I guess this is some kind of bug lurking in the Q2 release. I chose to use this benchmark as it clearly shows the potential of JIT optimizations.

/faq rendering

This request does not any requests to memcached or MySQL, but simply renders a Mako profile.

Python 2.5.2

Concurrency Level:      10
Time taken for tests:   18.940 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      11526000 bytes
HTML transferred:       11389000 bytes
Requests per second:    52.80 [#/sec] (mean)
Time per request:       189.400 [ms] (mean)
Time per request:       18.940 [ms] (mean, across all concurrent requests)
Transfer rate:          594.29 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0      11
Processing:    51  189  70.0    188    1187
Waiting:       50  188  70.1    188    1187
Total:         51  189  70.0    188    1187

Python 2.6.2

Concurrency Level:      10
Time taken for tests:   17.327 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      11526000 bytes
HTML transferred:       11389000 bytes
Requests per second:    57.71 [#/sec] (mean)
Time per request:       173.272 [ms] (mean)
Time per request:       17.327 [ms] (mean, across all concurrent requests)
Transfer rate:          649.61 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.6      0      10
Processing:    20  173  86.6    174    1207
Waiting:       20  172  86.7    174    1207
Total:         20  173  86.6    174    1207

Unladen Q1

Concurrency Level:      10
Time taken for tests:   44.839 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      11526000 bytes
HTML transferred:       11389000 bytes
Requests per second:    22.30 [#/sec] (mean)
Time per request:       448.391 [ms] (mean)
Time per request:       44.839 [ms] (mean, across all concurrent requests)
Transfer rate:          251.03 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.6      0      10
Processing:    49  448 141.6    462    1407
Waiting:       49  447 141.7    461    1406
Total:         50  448 141.5    462    1408

I have no clue why the performance of /faq was slow on Q1. I was able to reproduce this in 2 reruns.

Unladen Q2

Concurrency Level:      10
Time taken for tests:   16.678 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      11526000 bytes
HTML transferred:       11389000 bytes
Requests per second:    59.96 [#/sec] (mean)
Time per request:       166.780 [ms] (mean)
Time per request:       16.678 [ms] (mean, across all concurrent requests)
Transfer rate:          674.89 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   1.0      0      11
Processing:    26  166  87.5    167    1149
Waiting:       25  165  87.6    167    1148
Total:         26  166  87.5    167    1149

Resource usage

Python 2.5.2

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
74080 python       0.0%  1:14.32   9    77    298   46M   688K    49M    68M 

Unladen Swallow Q1

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
74273 python       0.0%  3:55.68   9    63    302   65M+  188K-   68M    87M 

Unladen Swallow Q2

  PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  VSIZE
74380 python       0.0%  1:09.29   9    70    356   75M   188K    83M   123M 

Conclusion

Like all benchmarks you should take this benchmark with a grain of salt! That said, let's conclude some things about performance:

  • Unladen Q1 wins the pystone benchmark - which thought does not mean much to me
  • Unladen Q2 wins the profile rendering and renders over 2 times faster than Python 2.5.2 and Unladen Swallow Q1. Which is pretty impressive
  • Python 2.6.2 crashes on profile rendering benchmark
  • Unladen Q2 wins the /faq rendering, but not by a large margin

Conclusions about resource usages:

  • Unladen Swallow Q1 uses a lot of CPU and memory, without rendering much faster than Python 2.5.2
  • Unladen Q2 uses least amount of CPU (yay fro JIT!)
  • Python 2.5.2 uses least amount of memory
  • Unladen Q2 uses most memory, this is expected as memory optimizations will come in Q3 release of Unladen Swallow

All in all, I must say I am impressed by the progress that Unladen Swallow Q2 shows. 2x performance improvement is a great deal and I think we can expect much more for the Q3 release.

29. Aug 2009 Benchmarks · Code · Plurk · Python
© Amir Salihefendic