I have thousands of media files in S3.
- The files could be mimetype plain_text, html, xml, pdf, binary, zip, etc
- In addition, some files might also be gzipped
I would like to render these files in DJango app. I don't want to provide user direct access to S3. In some cases, I want to modify the file before rendering it.
eg:
- /base/path/file_name_aaa.txt.gz <--- download from S3, unzip, and render preformatted text thru django
- /base/path/file_name_aaa.pdf <--- download from S3 and render as PDF thru django
- /base/path/file_name_bbb.pdf.gz <--- download from S3, unzip and render as PDF thru django
- /base/path/file_name_ccc.xml.gz <--- download from S3, unzip, replace some content, and render as unzipped xml thru django
I got the first part of plain text:
from boto.s3.connection import S3Connection
import zlib
def get_gzipped_content(stream):
content = ''
for part in stream_decompress(stream):
content += part
return content
def stream_decompress(stream):
'''
decompress s3 gzipped stream
http://stackoverflow.com/questions/12571913/python-unzipping-stream-of-bytes
'''
dec = zlib.decompressobj(16+zlib.MAX_WBITS) # same as gzip module
for chunk in stream:
rv = dec.decompress(chunk)
if rv:
yield rv
conn = S3Connection(aws_key, aws_secret)
fname = 'aaa/bbb/ccc_1234.txt.gz'
key = conn.get_bucket('my_bucket').get_key(fname)
if fname.lower().endswith('.gz'):
content = get_gzipped_content(key)
else:
content = key.get_contents_as_string()
(render content as string in django)
I would appreciate help in getting other mime types/gzip