7

I am trying to use the requests library in Python to upload a file into Fedora commons repository on localhost. I'm fairly certain my main problem is not understanding open() / read() and what I need to do to send data with an http request.

def postBinary(fileName,dirPath,url):
    path = dirPath+'/'+fileName
    print('to ' + url + '\n' + path)
    openBin = {'file':(fileName,open(path,'rb').read())}
    headers = {'Slug': fileName} #not important
    r = requests.put(url, files=openBin,headers=headers, auth=HTTPBasicAuth('username', 'pass'))
    print(r.text)
    print("and the url used:")
    print(r.url)

This will successfully upload a file in the repository, but it will be slightly larger and corrupted after. For example an image that was 6.6kb became 6.75kb and was not openable anymore.

So how should I properly open and upload a file using put in python?

###Extra details:###

  • When I replace files=openBin with data=openBin I end up with my dictionary and I presume the data as a string. I don't know if that information is helpful or not.
    "file=FILE_NAME.extension&file=TYPE89a%24%02Q%03%E7%FF%00E%5B%19%FC%.... and the size of the file increases to a number of megabytes

  • I am using specifically put because the Fedora RESTful HTTP API end point says to use put.

The following command does work:

curl -u username:password -H "Content-Type: text/plain" -X PUT -T /path/to/someFile.jpeg http://localhost:8080/fcrepo/rest/someFile.jpeg

d-cubed
  • 1,034
  • 5
  • 30
  • 58
awscott
  • 73
  • 1
  • 1
  • 3
  • Take a look [here](https://stackoverflow.com/questions/29104107/upload-image-using-post-form-data-in-python-requests) (using `POST` method) or [here](https://stackoverflow.com/questions/22567306/python-requests-file-upload) (using `PUT` method). – Mauro Baraldi Dec 12 '17 at 21:57
  • The post information didn't help me, exception sayin wrong protocol for socket. the PUT method link you send I have already tried also no luck:( created a file that is corrupted – awscott Dec 12 '17 at 22:41

1 Answers1

28

Updated

Using requests.put() with the files parameter sends a multipart/form-data encoded request which the server does not seem to be able to handle without corrupting the data, even when the correct content type is declared.

The curl command simply performs a PUT with the raw data contained in the body of the request. You can create a similar request by passing the file data in the data parameter. Specify the content type in the header:

headers = {'Content-type': 'image/jpeg', 'Slug': fileName}
r = requests.put(url, data=open(path, 'rb'), headers=headers, auth=('username', 'pass'))

You can vary the Content-type header to suit the payload as required.


Try setting the Content-type for the file.

If you are sure that it is a text file then try text/plain which you used in your curl command - even though you would appear to be uploading a jpeg file? However, for a jpeg image, you should use image/jpeg.

Otherwise for arbitrary binary data you can use application/octet-stream:

openBin = {'file': (fileName, open(path,'rb'), 'image/jpeg' )}

Also it is not necessary to explicitly read the file contents in your code, requests will do that for you, so just pass the open file handle as shown above.

hd1
  • 33,938
  • 5
  • 80
  • 91
mhawke
  • 84,695
  • 9
  • 117
  • 138
  • hey, I did try content type text plain as the header, and various others. this can be called for any kind of file so I didn't specify. text plain still works in the curl when downloaded, however does not in python with requests neither does 'image/jpeg' or 'application/octet-stream'. tho I noticed that it has a property in the repository **hasMimeType** which is: multipart/form-data;boundary=9f74e4d3067e4ce482bdc9e311b58d2d does that help at all? – awscott Dec 12 '17 at 23:13
  • also thank you for the tip about read, I had added that before while debugging. ill take it out – awscott Dec 12 '17 at 23:17
  • @awscott: it seems that the server does not properly handle multipart/form formatted requests. I have updated my answer to show a simpler method that produces a similar request to that of the working `curl` command. – mhawke Dec 13 '17 at 00:34
  • you are correct, that worked! thank you! I made the Content-Type header to application/octet-stream as you suggested as well and now it works properly. – awscott Dec 13 '17 at 13:17
  • r = requests.put(url, data=open(path, 'rb'), headers=headers, auth=('username', 'pass')) - Regarding this if the file is too large, streaming it without reading them into memory would make more sense ? If yes, how to do that ? – Simplecode Dec 10 '20 at 06:29
  • @Simplecode: [streaming uploads](https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads) is the default behaviour when a file-like object, in this case an open file, is passed in the `data` argument. – mhawke Dec 31 '20 at 03:44