How Plurk does systems monitoring

Plurk puppet master

I think that monitoring and having an overview is one of the most important things in running a healthy web-application. As far as I can see, the toolset of monitoring things is pretty under developed and I want to showcase Plurk's monitoring tools, which we will probably open source in the future.

First of all, we at Plurk use the standard tools that's used for monitoring. These are:

Both cacti and nagios are pretty essential in our tools chain, but they fail to answer some essential questions:

  • what database queries are the most expensive? those queries that are most expensive, what indexes do they use?
  • what requests are the most expensive? what requests do most selects to the database? most updates to the database? most requests to memcached?
  • what are the most common errors thrown and how does the traceback of them look like?

Once you have these things, it's much easier to optimize and fix errors. Once you have these things answered, you'll feel blind when you don't have them at your disposal.

Let's look at how these different questions are answered.

SQL Monitor

SQL monitor aggregates queries, groups them, times them and runs a SQL EXPLAIN on them. It's like MySQL Enterprise SQL monitor (which costs $595 pr. server pr. year!) Our SQL monitor can log queries from all our servers and has a web front end that can sort results by average execute time, times run etc.

An example screenshot of Plurk's SQL Monitor:

Plurk SQL monitor

Request monitor

I did the request logger some time ago and it's open sourced. Basically, it logs requests, times them, groups them and provides a web interface so one can easily see stats about the requests.

An example screenshot of Plurk's Request Monitor:

Plurk request monitor

Central logger

Like SQL Monitor and Request monitor, the central logger logs errors from all our servers and groups them together. It can quickly answer what errors are most common and give debug information about them.

An example screenshot of Plurk's Central logger:

Plurk central logger

Conclusion

Most of these tools are not that hard to make, but they provide essential information that can give you overview and can help you create a more robust and heathy product.

28. Mar 2009 Plurk · Tips
© Amir Salihefendic