1

I am trying to save the uploaded file manually for learning, without the help of request.files.

I am using Flask and it's giving me a str type of the raw body data, by print(type(reqeust.data))

  1. I get confused. Shouldn't I get binary data <type bytes> instead?

But then I think again, even if I get binary data, how can I filter the first several lines and then start to read the binary data from the right place?

For exmaple:

-----------------------------1699415032232102060211780227
Content-Disposition: form-data; name="myfile"; filename="Screenshot from 2018-10-05 15-49-07.png"
Content-Type: image/png

�PNG

�ߧd�tEXtSoftwaregnome-screenshot��>�IDATx���OPY����l�*c���=��El"f[��)3��S�+z-v�0�c������zp����6��qS�\W��6S�qM�S=tG�Ǩb��A�ؒvc���@rh��.N]���?JK����b+�J��(�����OR�T
-----------------------------1699415032232102060211780227--

  1. Could someone teach me how could I save the file data manually?
Rick
  • 7,007
  • 2
  • 49
  • 79
  • I don't understand what you mean by "manually"? – roganjosh Aug 13 '19 at 18:44
  • @roganjosh Retrieving the (PNG) file data and save it on the server, by using python file input/output functions. – Rick Aug 13 '19 at 18:45
  • So, without using `.save()` as in [this](https://stackoverflow.com/questions/46792270/saving-an-uploaded-file-to-disk-doesnt-work-in-flask)? What issue are you trying to get around? – roganjosh Aug 13 '19 at 18:50
  • @roganjosh Yes, without using `save()` provided by the framework. I just want to know how I could save the the file manully just like `save()` does. – Rick Aug 13 '19 at 18:52
  • 1
    The `save` method is [here]((https://github.com/pallets/werkzeug/blob/master/src/werkzeug/datastructures.py#L2778)), so you might piece together the process it goes through – roganjosh Aug 13 '19 at 19:08

1 Answers1

0

Finally I figured it out by myself.

  1. The reason why I am getting str from request.data is because I was using python2.7 flask packages. Such a pain in the ass. I would definitely use virtualenv next time, even for testing. But I still don't understand why I can use python3 syntax in the server code while the package is related to python2.7.

  2. So actually I am getting raw byte data like b'raw binary data' from request.data. And data from other functions e.g. like request.form['firstname'] have already been decoded.

So now the question boilds down to how I can rebuild the file, provided the binary data.


Af first, I prepare 2 small files for testing.

file1: 1.txt

content: 1234567

file2: test.png this small image ------> enter image description here

content (use open('test.png', 'rb').read()):

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\t\x00\x00\x00\x08\x08\x02\x00\x00\x00\xa4\xafB\xe2\x00\x00\x00\x03sBIT\x08\x08\x08\xdb\xe1O\xe0\x00\x00\x00\x10tEXtSoftware\x00Shutterc\x82\xd0\t\x00\x00\x00\x15IDAT\x08\xd7c\xd4\xe5Tb\xc0\x01\x98\x18p\x83\xa1"\x07\x00T;\x00h\xb9\x9335\x00\x00\x00\x00IEND\xaeB`\x82'

So the request.data I see on the server is:

b'-----------------------------16866548741414816351605255076\r\nContent-Disposition: form-data; name="myfile"; filename="1.txt"\r\nContent-Type: text/plain\r\n\r\n1234567\r\n-----------------------------16866548741414816351605255076\r\nContent-Disposition: form-data; name="myfile2"; filename="test.png"\r\nContent-Type: image/png\r\n\r\n\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\t\x00\x00\x00\x08\x08\x02\x00\x00\x00\xa4\xafB\xe2\x00\x00\x00\x03sBIT\x08\x08\x08\xdb\xe1O\xe0\x00\x00\x00\x10tEXtSoftware\x00Shutterc\x82\xd0\t\x00\x00\x00\x15IDAT\x08\xd7c\xd4\xe5Tb\xc0\x01\x98\x18p\x83\xa1"\x07\x00T;\x00h\xb9\x9335\x00\x00\x00\x00IEND\xaeB`\x82\r\n-----------------------------16866548741414816351605255076--\r\n'

Format it a little bit:

(The data can't be used directly because I added extra new lines for displaying.)

b'-----------------------------16866548741414816351605255076\r\n  
Content-Disposition: form-data; name="myfile"; filename="1.txt"\r\n
Content-Type: text/plain\r\n\r\n
1234567\r\n
  -----------------------------16866548741414816351605255076\r\n 
Content-Disposition: form-data; name="myfile2"; filename="test.png"\r\n
Content-Type: image/png\r\n\r\n
\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\t\x00\x00\x00\x08\x08\x02\x00\x00\x00\xa4\xafB\xe2\x00\x00\x00\x03sBIT\x08\x08\x08\xdb\xe1O\xe0\x00\x00\x00\x10tEXtSoftware\x00Shutterc\x82\xd0\t\x00\x00\x00\x15IDAT\x08\xd7c\xd4\xe5Tb\xc0\x01\x98\x18p\x83\xa1"\x07\x00T;\x00h\xb9\x9335\x00\x00\x00\x00IEND\xaeB`\x82\r\n
-----------------------------16866548741414816351605255076--\r\n'

Let raw_data = binary data above

  1. files_data_array = raw_data.split(b'-----------------------------16866548741414816351605255076\r\n)

Then I get an array with each files in different index.

Here files_data_array[1] contains the first file meta info and data. files_data_array[2] contains the second file meta info and data. And so on if you have more files.

[b'', b'Content-Disposition: form-data; name="myfile"; filename="1.txt"\r\nContent-Type: text/plain\r\n\r\n1234567\r\n', b'Content-Disposition: form-data; name="myfile2"; filename="test.png"\r\nContent-Type: image/png\r\n\r\n\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\t\x00\x00\x00\x08\x08\x02\x00\x00\x00\xa4\xafB\xe2\x00\x00\x00\x03sBIT\x08\x08\x08\xdb\xe1O\xe0\x00\x00\x00\x10tEXtSoftware\x00Shutterc\x82\xd0\t\x00\x00\x00\x15IDAT\x08\xd7c\xd4\xe5Tb\xc0\x01\x98\x18p\x83\xa1"\x07\x00T;\x00h\xb9\x9335\x00\x00\x00\x00IEND\xaeB`\x82\r\n-----------------------------16866548741414816351605255076--\r\n']
  1. file2_data = files_data_array[2]
b'Content-Disposition: form-data; name="myfile2"; filename="test.png"\r\nContent-Type: image/png\r\n\r\n\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\t\x00\x00\x00\x08\x08\x02\x00\x00\x00\xa4\xafB\xe2\x00\x00\x00\x03sBIT\x08\x08\x08\xdb\xe1O\xe0\x00\x00\x00\x10tEXtSoftware\x00Shutterc\x82\xd0\t\x00\x00\x00\x15IDAT\x08\xd7c\xd4\xe5Tb\xc0\x01\x98\x18p\x83\xa1"\x07\x00T;\x00h\xb9\x9335\x00\x00\x00\x00IEND\xaeB`\x82\r\n-----------------------------16866548741414816351605255076--\r\n'

Then split the meta by file2_meta_info = file2_data.split(b'\r\n\r\n', maxsplit=1)[0]. Notice that here I am splitting binary data, and in case there are b'\r\n\r\n' in the file data, settting maxsplit is necessary.

Now I get file2_meta_info as b'Content-Disposition: form-data; name="myfile"; filename="1.txt"', I can decode it and get whatever meta info I want.

Now turn to the file body data itself, file2_body_data = file2_data.split(b'\r\n\r\n', maxsplit=1)[1]

I get

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\t\x00\x00\x00\x08\x08\x02\x00\x00\x00\xa4\xafB\xe2\x00\x00\x00\x03sBIT\x08\x08\x08\xdb\xe1O\xe0\x00\x00\x00\x10tEXtSoftware\x00Shutterc\x82\xd0\t\x00\x00\x00\x15IDAT\x08\xd7c\xd4\xe5Tb\xc0\x01\x98\x18p\x83\xa1"\x07\x00T;\x00h\xb9\x9335\x00\x00\x00\x00IEND\xaeB`\x82\r\n-----------------------------16866548741414816351605255076--\r\n'

I still need to cut some bytes, compared to the content of test.png showed at the beginning

real_file2_body_data = file2_body_data.rsplit(b'\r\n', maxsplit=2)[0]'

Finally I can rebuid the file with :

f = open('test2.png', 'wb')
f.write(real_file2_body_data)
f.close()

Done!

Rick
  • 7,007
  • 2
  • 49
  • 79
  • Sorry for the messy variable naming :P. And b y reinventing the wheel, now I feel relaxed about what's happening under the hood. And for other languages which don't provide `b'binary data'.split()`, one can always use low level function to search the specific byte and do the same splitting. – Rick Aug 14 '19 at 09:00