Parse multipart request string in Python

Question

I have a string like this

"--5b34210d81fb44c5a0fdc1a1e5ce42c3\r\nContent-Disposition: form-data; name=\"author\"\r\n\r\nJohn Smith\r\n--5b34210d81fb44c5a0fdc1a1e5ce42c3\r\nContent-Disposition: form-data; name=\"file\"; filename=\"example2.txt\"\r\nContent-Type: text/plain\r\nExpires: 0\r\n\r\nHello World\r\n--5b34210d81fb44c5a0fdc1a1e5ce42c3--\r\n"

I also have request headers available in other vairbles.

How do I easily parse this with Python3?

I am handling a file upload in AWS Lambda via API Gateway, request body and headers are available via Python dicts.

There are other similar questions on stackoverflow, but most are assuming use of the requests module or other modules and expect the request details to be in a specific object or format.

NOTE: I am aware its possible to have user upload to S3 and trigger Lambda, but I am intentionally choosing not to do that in this case.

score 7 · Accepted Answer · answered Jun 19 '18 at 12:12

7

It can be parsed by using something like

from requests_toolbelt.multipart import decoder
multipart_string = "--ce560532019a77d83195f9e9873e16a1\r\nContent-Disposition: form-data; name=\"author\"\r\n\r\nJohn Smith\r\n--ce560532019a77d83195f9e9873e16a1\r\nContent-Disposition: form-data; name=\"file\"; filename=\"example2.txt\"\r\nContent-Type: text/plain\r\nExpires: 0\r\n\r\nHello World\r\n--ce560532019a77d83195f9e9873e16a1--\r\n"
content_type = "multipart/form-data; boundary=ce560532019a77d83195f9e9873e16a1"
decoder.MultipartDecoder(multipart_string, content_type)

answered Jun 19 '18 at 12:12

Sam Anthony

1,669
2
22
39

2

You should hopefully find that `multipart/form-data` is sufficient as `content_type`... because the boundary string is not something you should have to find for yourself, and will typically vary for each message. – Michael - sqlbot Jun 20 '18 at 11:42
2

Thanks for the info. It seemed that the boundary in the header might have actually been required by MultipartDecoder to parse the multipart string. I ended up implementing it to use the correct mime-type anyway which was available in other variables presented by AWS Lambda. – Sam Anthony Jun 20 '18 at 15:40

score 4 · Answer 2 · edited Aug 17 '20 at 00:05

If you want to use Python's CGI,

from cgi import parse_multipart, parse_header
from io import BytesIO

c_type, c_data = parse_header(event['headers']['Content-Type'])
assert c_type == 'multipart/form-data'
decoded_string = base64.b64decode(event['body'])
#For Python 3: these two lines of bugfixing are mandatory
#see also: https://stackoverflow.com/questions/31486618/cgi-parse-multipart-function-throws-typeerror-in-python-3
c_data['boundary'] = bytes(c_data['boundary'], "utf-8")
c_data['CONTENT-LENGTH'] = event['headers']['Content-length']
form_data = parse_multipart(BytesIO(decoded_string), c_data)

for image_str in form_data['file']:
    ...

cesartalves · Answer 3 · 2020-01-21T13:40:18.377

Expanding on sam-anthony' answer (I had to make some fixes for it to work on python 3.6.8):

from requests_toolbelt.multipart import decoder

multipart_string = b"--ce560532019a77d83195f9e9873e16a1\r\nContent-Disposition: form-data; name=\"author\"\r\n\r\nJohn Smith\r\n--ce560532019a77d83195f9e9873e16a1\r\nContent-Disposition: form-data; name=\"file\"; filename=\"example2.txt\"\r\nContent-Type: text/plain\r\nExpires: 0\r\n\r\nHello World\r\n--ce560532019a77d83195f9e9873e16a1--\r\n"
content_type = "multipart/form-data; boundary=ce560532019a77d83195f9e9873e16a1"

for part in decoder.MultipartDecoder(multipart_string, content_type).parts:
  print(part.text)

John Smith
Hello World

What you'd have to do is install this library through pip install requests-toolbelt --target=. and then upload it along with your lambda script

Here's a working example:

from requests_toolbelt.multipart import decoder

def lambda_handler(event, context):

    content_type_header = event['headers']['Content-Type']

    body = event["body"].encode()

    response = ''
    for part in decoder.MultipartDecoder(body, content_type_header).parts:
      response += part.text + "\n"

    return {
        'statusCode': 200,
        'body': response
    }

This should be enough for your dependencies to be recognized. If they aren't, try using the "/python/lib/python3.6/site-packages" file structure inside the zip with your python script at root"

score 3 · Answer 4 · answered Mar 03 '20 at 23:56

Had a bunch of weird encoding issues and also odd behavior with api gateway, originally received the body of the request at bytes and then after redeploying started to receive them as base64. Anyway this is the code that ended up working for me.

import json
import base64
import boto3
from requests_toolbelt.multipart import decoder

s3client = boto3.client("s3")
def lambda_handler(event, context):
    content_type_header = event['headers']['content-type']
    postdata = base64.b64decode(event['body']).decode('iso-8859-1')
    imgInput = ''
    lst = []
    for part in decoder.MultipartDecoder(postdata.encode('utf-8'), content_type_header).parts:
        lst.append(part.text)
    response = s3client.put_object(  Body=lst[0].encode('iso-8859-1'),  Bucket='test',    Key='mypicturefinal.jpg')
    return {'statusCode': '200','body': 'Success', 'headers': { 'Content-Type': 'text/html' }}

score 1 · Answer 5 · answered Aug 10 '23 at 14:17

The cgi module is unfortunately deprecated starting in Python 3.11.

If you can use the multipart library (the current cgi module documentation mentions it as a possible replacement), you can use its parse_form_data() function in a AWS Lambda function like this:

import base64
from io import BytesIO

from multipart import parse_form_data


def lambda_handler(event, context):
    """
    Process a HTTP POST request of encoding type "multipart/form-data".
    """

    # HTTP headers are case-insensitive
    headers = {k.lower():v for k,v in event['headers'].items()}

    # AWS API Gateway applies base64 encoding on binary data
    body = base64.b64decode(event['body'])

    # Parse the multipart form data
    environ = {
        'CONTENT_LENGTH': headers['content-length'],
        'CONTENT_TYPE': headers['content-type'],
        'REQUEST_METHOD': 'POST',
        'wsgi.input': BytesIO(body)
    }
    form, files = parse_form_data(environ)

    # Example usage...
    form_data = dict(form)
    logger.info(form_data)

    attachments = {key:{
            'filename': file.filename,
            'content_type': file.content_type,
            'size': file.size,
            'data': file.raw
        } for key,file in files.items()}
    logger.info(attachments)

score 0 · Answer 6 · answered Aug 19 '20 at 10:48

If using CGI, I recommend using FieldStorage:

from cgi import FieldStorage

fs = FieldStorage(fp=event['body'], headers=event['headers'], environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':event['headers']['Content-Type'], })['file']
originalFileName = fs.filename
binaryFileData = fs.file.read()

see also: https://stackoverflow.com/a/38718958/10913265

If the event body contains multiple files:

fs = FieldStorage(fp=event['body'], headers=event['headers'], environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':event['headers']['Content-Type'], })['file']

delivers a list of FieldStorage objects. So you can do:

for f in fs:
    originalFileName = f.filename
    binaryFileData = f.file.read()

Altogether my solution for dealing with a single file as well as multiple files as well as a body containing no file and assuring that it was mutlipart/form-data:

from cgi import parse_header, FieldStorage

#see also: https://stackoverflow.com/a/56405982/10913265
c_type, c_data = parse_header(event['headers']['Content-Type'])
assert c_type == 'multipart/form-data'

#see also: https://stackoverflow.com/a/38718958/10913265
fs = FieldStorage(fp=event['body'], headers=event['headers'], environ={'REQUEST_METHOD':'POST', 'CONTENT_TYPE':event['headers']['Content-Type'], })['file']

#If fs contains a single file or no file: making FieldStorage object to a list, so it gets iterable
if not(type(fs) == list):
    fs = [fs]

for f in fs:
    originalFileName = f.filename
    #no file: 
    if originalFileName == '':
        continue
    binaryFileData = f.file.read()
    #Do something with the data

this returned `TypeError: fp must be file pointer Traceback (most recent call last)` — user3821178, Jan 13 '21 at 05:54

Parse multipart request string in Python

6 Answers6

Linked