3

i'm actually trying to do something that i do not know if its ok.

Problem:

I have a web client and a web server, the server (written in python with flask) processes a pdf file in order to get some data, and the client just send the pdf file and waits for the response. The think is that the client can send various pdf files to process and what i want to do is, to send all the pdfs from the client to the server in just one request.

What I have planned to do:

I was thinking on convert the Blob of each pdf in a String and send a POST Request with a JSON body like this:

BODY:
  {
    "content":[
        {"name": "pdf_name_1.pdf", "data": "some blob data converted to string"},
        {"name": "pdf_name_2.pdf", "data": "some blob data converted to string"},
        {"name": "pdf_name_3.pdf", "data": "some blob data converted to string"},
        ...
    ]
}

So then in the server i was thinking to convert again the data into a blob(bytes) in order to write down the pdf a start the processing the data.

My question:

Is there any way to convert the str representation of the pdf to bytes in order to write down in disk the pdf with python?

Thanks a lot, if some one come up with another idea to send bunch of pdfs in only one request let me know please.

pd: I'm using python 3.5 and Flask for the web server.

Community
  • 1
  • 1
  • Related: [Uploading multiple files in a single request using python requests module](https://stackoverflow.com/q/18179345/2823755), [Download multiple CSVs using Flask?](https://stackoverflow.com/q/28568687/2823755),. There are other possibly valid search hits. Maybe the easiest would be to make a single zip file and send it. your question is a little too broad - please take the time to read [mcve] and [ask] and the the other links found on that page. – wwii Sep 06 '18 at 16:14
  • 3
    If you use base-64 encoding you can easily turn the blob back into binary with the [`base64` module](https://docs.python.org/3/library/base64.html). – Mark Ransom Sep 06 '18 at 16:17

1 Answers1

1

In such cases, it's preferred to send file data passing that with the files keyword, like so:

import requests


def send_pdf_data(filename_list, encoded_pdf_data):
    files = {}

    for (filename, encoded, index) in zip(filename_list, encoded_pdf_data, range(len(filename_list))):
        files[f"pdf_name_[index].pdf"] = (filename, open(filename, 'rb'), 'application/pdf')

    data = {}
    # *Put whatever you want in data dict*

    requests.post("http://yourserveradders", data=data, files=files)


def main():
    filename_list = ["pdf_name_1.pdf", "pdf_name_2.pdf"]
    pdf_blob_data = [open(filename, 'wb').read() for filename
                     in filename_list]

if __name__ == '__main__':
    main()

However, if you really want to pass data as json, you should use base-64 module as @Mark Ransom mentioned.

You can implement it in this way:

import requests
import json
import base64


def encode(data: bytes):
    """
    Return base-64 encoded value of binary data.
    """
    return base64.b64encode(data)


def decode(data: str):
    """
    Return decoded value of a base-64 encoded string.
    """
    return base64.b64decode(data.encode())


def get_pdf_data(filename):
    """
    Open pdf file in binary mode,
    return a string encoded in base-64.
    """
    with open(filename, 'rb') as file:
        return encode(file.read())


def send_pdf_data(filename_list, encoded_pdf_data):
    data = {}
    # *Put whatever you want in data dict*
    # Create content dict.
    content = [dict([("name", filename), ("data", pdf_data)])
               for (filename, data) in zip(filename_list, encoded_pdf_data)]
    data["content"] = content

    data = json.dumps(data) # Convert it to json.
    requests.post("http://yourserveradders", data=data)


def main():
    filename_list = ["pdf_name_1.pdf", "pdf_name_2.pdf"]
    pdf_blob_data = [get_pdf_data(filename) for filename
                     in filename_list]

if __name__ == '__main__':
    main()
Federico Rubbi
  • 714
  • 3
  • 16
  • Thanks a lot, I think its better to send a JSON with the string, than send all the files (the size of the files can be greater than 1mb and also the quantity of the files can be greater than 1000) – Kevin mendieta perez Sep 06 '18 at 19:08
  • Another suggestion: you could send more requests at the same time, each one with some pdf file in json. It would speed up your code considerably. If you're interested, you may give a look here: https://docs.python.org/3/library/multiprocessing.html – Federico Rubbi Sep 06 '18 at 19:28