Releasing bitmapist.cohort - or how we saved over $2000/month!

Redis logo

I released bitmapist a while ago and I am happy to release an extension that makes the library much more powerful!

Installation and source code

sudo pip install -U bitmapist

Fork the code at Github.

The reason why I implemented this

I want to tell you why I coded this (and how we saved over $2000/month by having this library). I looked at Mixpanel's retention feature - which seems amazing. The problem for us is that we would need to track over 20million events pr. month and Mixpanel is crazy expensive (it would cost us over $2000/month to get this feature!)

So I did what any sensible person would do: I coded my own version and open-sourced it so others can contribute. There's for example already a PHP port of bitmapist!

What can it help me with?

This library makes it possible to implement real-time, highly scalable analytics that can answer following questions:

  • Generate a cohort table over real-time data stored in bitmapist
  • How many % of users that were active last [days, weeks, months] are still active?
  • How many % of users that performed action X also performed action Y (and this over time)
  • And a lot of other things

If you want to read more about cohort please read following:

Screenshot of bitmapist.cohort

With bitmapist.cohort you can get a form and a table rendering of the data you keep in bitmapist. If this sounds confusing look at Mixpanel. Here's a screenshot:

bitmapist.cohort screenshot

Getting started

Mark user 123 as active and mark some other events:

from bitmapist import mark_event
from bitmapist import cohort as bitmapist_cohort

mark_event('active', 123)
mark_event('song:add', 123)
mark_event('song:play', 123)

Generate the form that makes it easy to query the bitmapist database:

html_form = bitmapist_cohort.render_html_form(
    action_url='/_Cohort',
    selections1=[ ('Are Active', 'active'), ],
    selections2=[ ('Played song', 'song:play'), ],
    time_group='days',
    select1='active',
    select2='song:play'
)

# action_url is the action URL of the FORM element
# selections1, selections2 specifies the events that the user can select in the form
# time_group can be `days`, `weeks` or `months`
# select1, select2 specifies the current selected events in the FORM

Get the data and render it via HTML:

dates_data = bitmapist_cohort.get_dates_data(select1='active',
                                             select2='song:play',
                                             time_group='days',
                                             system='default')

html_data = bitmapist_cohort.render_html_data(dates_data,
                                              time_group='days')

# All the arguments should come from the FORM element (html_form)
# but to make things more clear I have filled them in directly

Plug the above code into your codebase and you are ready to go :-)

Happy hacking!

13. Dec 2012 Announcements · Code · Database
© Amir Salihefendic