Updating caching headers for Amazon S3 and CloudFront

I made a major blunder when setting caching headers for Amazon S3 and CloudFront. Making such a blunder makes my sites slower and costs more in bandwidth. In this little blog post I will detail how to fix this and make sure you use correct caching headers.

Use the correct syntax

The first rule, make sure that the syntax is correct. Correct syntax looks like this:

  • Cache-Control: max-age=155520000, public
  • Expires: Sat, 29 Apr 2017 13:31:45-0000 GMT

For me, I used following syntax (it's wrong and wont be understood by browsers!):

  • Cache-Control: max-age 155520000

Read more in RFC 2616 for all the details sounding headers.

Be greedy and use file versioning

Use file versioning (for example make md5 hash a part of the name). You are forced to do this anyway since CloudFront does not support invalidations that well.

Already using file versioning? Great, then set your expires a lot of years in the future, since the filename will change when the files changes (i.e. you don't have to worry about invalidating old files).

Made a blunder? Use my script to update all S3 files in a bucket

Before you update headers to every S3 object make sure that the code works by testing it on dummy objects. I had a lot of issues getting it to work, since it will replace older metadata and not just update it. You can use my script (but it's not bulletproof, so be sure that any missing headers that you use are copied over to the updated metadata).

You will need to do following:

  • Using the script below test it out on dummy S3 objects
  • Update headers for every S3 object
  • Create new Amazon CloudFront distributions after the S3 objects are updated. Can be done via aws.amazon.com
  • Update DNS records to use the new distributions
#!/usr/bin/env python
"""
    fix_s3_cache_headers
    ~~~~~~~~

    Updates S3 objects with new cache-control headers.

    Usage::
        python fix_cloudfront.py <bucket_name> <keys>*

    Examples::
        Updates all keys of avatars.wedoist.com bucket::
            python fix_cloudfront.py avatars.wedoist.com

        Updates only one key::
            python fix_cloudfront.py avatars.w.com d39c2.gif

    Read more here::
        http://amix.dk/blog/post/19687

    :copyright: by Amir Salihefendic ( http://amix.dk/ )
    :license: MIT
"""
import sys
import mimetypes
import email
import time
import types
from datetime import datetime, timedelta

from boto.s3.connection import S3Connection
from boto.cloudfront import CloudFrontConnection


#--- AWS credentials ----------------------------------------------
AWS_KEY = '...'
AWS_SECRET = '...'


#--- Main function ----------------------------------------------
def main(s3_bucket_name, keys=None):
    s3_conn = S3Connection(AWS_KEY, AWS_SECRET)

    bucket = s3_conn.get_bucket(s3_bucket_name)

    if not keys:
        keys = bucket.list()

    for key in keys:
        if type(key) == types.StringType:
            key_name = key
            key = bucket.get_key(key)
            if not key:
                print 'Key not found %s' % key_name
                continue

        # Force a fetch to get metadata
        # see this why: http://goo.gl/nLWt9
        key = bucket.get_key(key.name)

        aggressive_headers = _get_aggressive_cache_headers(key)
        key.copy(s3_bucket_name, key, metadata=aggressive_headers, preserve_acl=True)

        print 'Updated headers for %s' % key.name


#--- Helpers ----------------------------------------------
def _get_aggressive_cache_headers(key):
    metadata = key.metadata

    metadata['Content-Type'] = key.content_type

    # HTTP/1.0 (5 years)
    metadata['Expires'] = '%s GMT' %\
        (email.Utils.formatdate(
            time.mktime((datetime.now() +
            timedelta(days=365*5)).timetuple())))

    # HTTP/1.1 (5 years)
    metadata['Cache-Control'] = 'max-age=%d, public' % (3600 * 24 * 360 * 5)

    return metadata


if __name__ == '__main__':
    main( sys.argv[1],
          sys.argv[2:] )
2. May 2012 Code · Python · Stuff
© Amir Salihefendic