9

I want to make a post request to a HTTPS-site that should respond with a .csv file. I have this Python code:

url = 'https://www.site.com/servlet/datadownload'
values = {
  'val1' : '123',
  'val2' : 'abc',
  'val3' : '1b3',
}

data = urllib.urlencode(values)
req = urllib2.Request(url,data)
response = urllib2.urlopen(req)
myfile = open('file.csv', 'wb')
shutil.copyfileobj(response.fp, myfile)
myfile.close()

But 'm getting the error:

BadStatusLine: ''    (in httplib.py)

I've tried the post request with the Chrome Extension: Advanced REST client (screenshot) and that works fine.

What could be the problem and how could I solve it? (is it becasue of the HTTPS?)


EDIT, refactored code:

try:
    #conn = httplib.HTTPSConnection(host="www.site.com", port=443)

=> Gives an BadStatusLine: '' error

    conn = httplib.HTTPConnection("www.site.com");
    params  = urllib.urlencode({'val1':'123','val2':'abc','val3':'1b3'})
    conn.request("POST", "/nps/servlet/exportdatadownload", params)
    content = conn.getresponse()
    print content.reason, content.status
    print content.read()
    conn.close()
except:
    import sys
    print sys.exc_info()[:2]

Output:

Found 302

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>302 Found</TITLE>
</HEAD><BODY>
<H1>Found</H1>
The document has moved <A HREF="https://www.site.com/nps/servlet/exportdatadownload">here</A>.<P>
<HR>
<ADDRESS>Oracle-Application-Server-10g/10.1.3.5.0 Oracle-HTTP-Server Server at mp-www1.mrco.be Port 7778</ADDRESS>
</BODY></HTML>

What am I doing wrong?

francisMi
  • 925
  • 3
  • 15
  • 31
  • 1
    What version of python are you using? I would check [this answer](http://stackoverflow.com/a/2146467/532471) to see if httplib is working ok with https. I can't try our your code right now, but another piece of advice would be to use a friendlier library for your requests, called... [requests](http://docs.python-requests.org/en/latest/). – Geekfish Jan 17 '13 at 18:16
  • What do you get if you `https_handler = urllib2.HTTPSHandler(1)` `opener = urllib2.build_opener(https_handler)` `response = opener.open(req)` in place of `response = urllib2.urlopen(req)`? You should still get the error, but that should turn on debugging in the https response, which should mean that your response will be printed, which you can then use to help track down what isn't working. If it's for some odd reason using another handler, just try the same thing with `urllib2.HTTPHandler(1)` or whatever handler is relevant. – Silas Ray Jan 17 '13 at 19:49
  • I noticed that you are using urllib and urllib2 at the same time. Is that intentional? – Josh Jan 17 '13 at 20:38
  • You should post the site. –  Mar 05 '13 at 20:46

4 Answers4

14

Is there a reason you've got to use urllib? Requests is simpler, better in almost every way, and abstracts away some of the cruft that makes urllib hard to work with.

As an example, I'd rework you example as something like:

import requests
resp = requests.post(url, data=values, allow_redirects=True)

At this point, the response from the server is available in resp.text, and you can do what you'd like with it. If requests wasn't able to POST properly (because you need a custom SSL certificate, for example), it should give you a nice error message that tells you why.

Even if you can't do this in your production environment, do this in a local shell to see what error messages you get from requests, and use that to debug urllib.

Dan
  • 1,314
  • 1
  • 9
  • 18
  • The same error: BadStatusLine: `ConnectionError: HTTPSConnectionPool(host='www.site.com', port=443): Max retries exceeded with url: /nps/servlet/exportdatadownload/ (Caused by : '')` When I browse to `https://www.site.com/nps/servlet/exportdatadownload?val1=123& val2=abc&val3=1b3`, the excel file is downloaded automatically , but still nog succes with a Python script... – francisMi Mar 06 '13 at 00:36
  • `BadStatusLine` means that the server sent back an HTTP status that Python doesn't understand (and it understands all the "normal" ones). From a command-line, can you do a `curl -I https://site.com` (with whatever the real URL is there) and paste the results? If you don't have `curl`, you can also use hurl.it (in which case I'm just interested in the first paragraph of the response). – Dan Mar 06 '13 at 01:38
3

The BadStatusLine: '' (in httplib.py) gives away that there might be something else going on here. This may happen when the server sends no reply back at all, and just closes the connection.

As you mentioned that you're using an SSL connection, this might be particularly interesting to debug (with curl -v URL if you want). If you find out that curl -2 URL (which forces the use of SSLv2) seems to work, while curl -3 URL (SSLv3), doesn't, you may want to take a look at issue #13636 and possibly #11220 on the python bugtracker. Depending on your Python version & a possibly misconfigured webserver, this might be causing a problem: the SSL defaults have changed in v2.7.3.

Cedric VB
  • 222
  • 1
  • 7
1
   conn = httplib.HTTPSConnection(host='www.site.com', port=443, cert_file=_certfile)
   params  = urllib.urlencode({'cmd': 'token', 'device_id_st': 'AAAA-BBBB-CCCC',
                                'token_id_st':'DDDD-EEEE_FFFF', 'product_id':'Unit Test',
                                'product_ver':"1.6.3"})
    conn.request("POST", "servlet/datadownload", params)
    content = conn.getresponse().read()
    #print response.status, response.reason
    conn.close()
bioffe
  • 6,283
  • 3
  • 50
  • 65
  • I've tried your code, but adapted the first line to just `httplib.HTTPSConnection('www.site.com')`. When I print `content.status` I get `Found 302`. And printing the content it self, I get html code with `The document has moved here.

    ` But how do I get the founed file?

    – francisMi Feb 03 '13 at 15:50
  • I've edited my question with more information and with your code. – francisMi Feb 03 '13 at 16:49
  • try url `https://google.com`, it feels you have some sort of server/destination issues. – bioffe Feb 03 '13 at 18:11
  • `httplib.HTTPSConnection(host="www.google.com", port=443)` gives an `Not Found 404` output and `httplib.HTTPConnection("www.google.com")` gives `Service Unavailable 503` – francisMi Feb 03 '13 at 19:03
  • That's good. There isn't `/servlet/datadownload` URL on google's website, hence the error. Now I am confident your server is the issue. Try to read something simple, like static html page(that you can access via a browser). – bioffe Feb 04 '13 at 04:19
0

The server may not like the missing headers, particularly user-agent and content-type. The Chrome image shows what is used for these. Maybe try adding the headers:

import httplib, urllib

host = 'www.site.com'
url = '/servlet/datadownload'

values = {
  'val1' : '123',
  'val2' : 'abc',
  'val3' : '1b3',
}

headers = {
    'User-Agent': 'python',
    'Content-Type': 'application/x-www-form-urlencoded',
}

values = urllib.urlencode(values)

conn = httplib.HTTPSConnection(host)
conn.request("POST", url, values, headers)
response = conn.getresponse()

data = response.read()

print 'Response: ', response.status, response.reason
print 'Data:'
print data

This is untested code, and you may want to experiment by adding other header values to match your screenshot. Hope it helps.

Fiver
  • 9,909
  • 9
  • 43
  • 63