Updating caching headers for Amazon S3 and CloudFront
I made a major blunder when setting caching headers for Amazon S3 and CloudFront. Making such a blunder makes my sites slower and costs more in bandwidth. In this little blog post I will detail how to fix this and make sure you use correct caching headers.
Use the correct syntaxThe first rule, make sure that the syntax is correct. Correct syntax looks like this:
For me, I used following syntax (it's wrong and wont be understood by browsers!):
Read more in RFC 2616 for all the details sounding headers. Be greedy and use file versioningUse file versioning (for example make md5 hash a part of the name). You are forced to do this anyway since CloudFront does not support invalidations that well. Already using file versioning? Great, then set your expires a lot of years in the future, since the filename will change when the files changes (i.e. you don't have to worry about invalidating old files). Made a blunder? Use my script to update all S3 files in a bucketBefore you update headers to every S3 object make sure that the code works by testing it on dummy objects. I had a lot of issues getting it to work, since it will replace older metadata and not just update it. You can use my script (but it's not bulletproof, so be sure that any missing headers that you use are copied over to the updated metadata). You will need to do following:
#!/usr/bin/env python
"""
fix_s3_cache_headers
~~~~~~~~
Updates S3 objects with new cache-control headers.
Usage::
python fix_cloudfront.py <bucket_name> <keys>*
Examples::
Updates all keys of avatars.wedoist.com bucket::
python fix_cloudfront.py avatars.wedoist.com
Updates only one key::
python fix_cloudfront.py avatars.w.com d39c2.gif
Read more here::
http://amix.dk/blog/post/19687
:copyright: by Amir Salihefendic ( http://amix.dk/ )
:license: MIT
"""
import sys
import mimetypes
import email
import time
import types
from datetime import datetime, timedelta
from boto.s3.connection import S3Connection
from boto.cloudfront import CloudFrontConnection
#--- AWS credentials ----------------------------------------------
AWS_KEY = '...'
AWS_SECRET = '...'
#--- Main function ----------------------------------------------
def main(s3_bucket_name, keys=None):
s3_conn = S3Connection(AWS_KEY, AWS_SECRET)
bucket = s3_conn.get_bucket(s3_bucket_name)
if not keys:
keys = bucket.list()
for key in keys:
if type(key) == types.StringType:
key_name = key
key = bucket.get_key(key)
if not key:
print 'Key not found %s' % key_name
continue
# Force a fetch to get metadata
# see this why: http://goo.gl/nLWt9
key = bucket.get_key(key.name)
aggressive_headers = _get_aggressive_cache_headers(key)
key.copy(s3_bucket_name, key, metadata=aggressive_headers, preserve_acl=True)
print 'Updated headers for %s' % key.name
#--- Helpers ----------------------------------------------
def _get_aggressive_cache_headers(key):
metadata = key.metadata
metadata['Content-Type'] = key.content_type
# HTTP/1.0 (5 years)
metadata['Expires'] = '%s GMT' %\
(email.Utils.formatdate(
time.mktime((datetime.now() +
timedelta(days=365*5)).timetuple())))
# HTTP/1.1 (5 years)
metadata['Cache-Control'] = 'max-age=%d, public' % (3600 * 24 * 360 * 5)
return metadata
if __name__ == '__main__':
main( sys.argv[1],
sys.argv[2:] )
|
|