13

I try to perform a simple POST-request with urllib2. However the servers response indicates that it receives a simple GET. I checked the type of the outgoing request, but it is set to POST.
To check whether the server behaves like I expect it to, I tried to perform a GET request with the (former POST-) data concatenated to the url. This got me the answer I expected.
Does anybody have a clue what I misunderstood?

def connect(self):
    url = 'http://www.mitfahrgelegenheit.de/mitfahrzentrale/Dresden/Potsdam.html/'
    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
    header = { 'User-Agent' : user_agent }

    values = {
      'city_from' : 69,
      'radius_from' : 0,
      'city_to' : 263,
      'radius_to' : 0,
      'date' : 'date',
      'day' : 5,
      'month' : 03,
      'year' : 2012,
      'tolerance' : 0
    }

    data = urllib.urlencode(values)
    # req = urllib2.Request(url+data, None, header) # GET works fine
    req = urllib2.Request(url, data, header)  # POST request doesn't not work

    self.response = urllib2.urlopen(req)

This seems to be a problem like the one discussed here: Python URLLib / URLLib2 POST but I'm quite sure that in my case the trailing slash is not missing. ;)

I fear this might be a stupid misconception, but I'm already wondering for hours!



EDIT: A convenience function for printing:

def response_to_str(response):
    return response.read()

def dump_response_to_file(response):
    f = open('dump.html','w')
    f.write(response_to_str(response))



EDIT 2: Resolution:

I found a tool to capture the real interaction with the site, http://fiddler2.com/fiddler2/. Apparently the server takes the data from the input form, redirects a few times and and then makes a GET request with this data simply appended to the url.
Everything is fine with urllib2 and I apologize for misusing your time!

Community
  • 1
  • 1
Zakum
  • 2,157
  • 2
  • 22
  • 30
  • But what is the answer you expected? And how are you sure this isn't a server-side problem? – declension Mar 02 '12 at 23:35
  • The behavior I expect you can observe by removing the comment from line 19 (and commenting out line 20, of course). Since this gets me what I want I assume the server works fine. To be precise I want to receive all rides from Dresden to Potsdam on the 5th of March but instead I get all the rides in the system. – Zakum Mar 02 '12 at 23:44
  • Can you post server side code too? – sholsapp Mar 03 '12 at 00:23
  • Unfortunately not, because I do not have access to it. – Zakum Mar 03 '12 at 00:27
  • 2
    Perhaps the server doesn't accept `POST` requests to this page then. – sholsapp Mar 03 '12 at 23:22

4 Answers4

15

Things you need to check:

  • Are you sure you are posting to the right URL?
  • Are you sure you can retrieve results without being logged in?
  • Show us some example output for different post values.

You can find correct post URL using Firefox's Firebug or Google Chromes DevTools.

I provided you with some code that supports cookies so that you can log-in first and use the cookie to make the subsequent request with your post parameters.

Finally, if you could show us some example HTML output, that will make life easier.

Here's is my code which has worked for me quite reliably so far for POST-ing to most webpages including pages protected with CSRF/XSRF (as long as you are able to correctly figure out what to post and where (which URL) to post to).

import cookielib
import socket
import urllib
import urllib2

url = 'http://www.mitfahrgelegenheit.de/mitfahrzentrale/Dresden/Potsdam.html/'
http_header = {
                "User-Agent" : "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.46 Safari/535.11",
                "Accept" : "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,text/png,*/*;q=0.5",
                "Accept-Language" : "en-us,en;q=0.5",
                "Accept-Charset" : "ISO-8859-1",
                "Content-type": "application/x-www-form-urlencoded",
                "Host" : "www.mitfahrgelegenheit.de",
                "Referer" : "http://www.mitfahrgelegenheit.de/mitfahrzentrale/Dresden/Potsdam.html/"
                }

params = {
  'city_from' : 169,
  'radius_from' : 0,
  'city_to' : 263,
  'radius_to' : 0,
  'date' : 'date',
  'day' : 5,
  'month' : 03,
  'year' : 2012,
  'tolerance' : 0
}

# setup socket connection timeout
timeout = 15
socket.setdefaulttimeout(timeout)

# setup cookie handler
cookie_jar = cookielib.LWPCookieJar()
cookie = urllib2.HTTPCookieProcessor(cookie_jar)

# setup proxy handler, in case some-day you need to use a proxy server
proxy = {} # example: {"http" : "www.blah.com:8080"}

# create an urllib2 opener()
#opener = urllib2.build_opener(proxy, cookie) # with proxy
opener = urllib2.build_opener(cookie) # we are not going to use proxy now

# create your HTTP request
req = urllib2.Request(url, urllib.urlencode(params), http_header)

# submit your request
res = opener.open(req)
html = res.read()

# save retrieved HTML to file
open("tmp.html", "w").write(html)
print html
gsbabil
  • 7,505
  • 3
  • 26
  • 28
1

Just to close the question:
The problem really was, that the server did not expect a POST requests (although it should, considered the use case). So (once again) the framework was not broken. ;)

Zakum
  • 2,157
  • 2
  • 22
  • 30
0

Try adding to your headers the pair:

   'Content-type': 'application/x-www-form-urlencoded'
Not_a_Golfer
  • 47,012
  • 14
  • 126
  • 92
  • I just tried using your exact code here, watched it with wireshark, and it looks like a POST request to me. 211 23.544957 10.0.0.6 62.146.53.71 HTTP 414 POST /mitfahrzentrale/Dresden/Potsdam.html/ HTTP/1.1 (application/x-www-form-urlencoded) – Not_a_Golfer Mar 02 '12 at 23:35
  • I assume it really is a POST request, but it looks like the server redirects and changes it to a GET... Could you try out the GET request I commented out in line 19 and compare the result to the one of the POST-request, Dvir? I added a dump-to-html functions to my question above so it shouldn't take to much time. I would really appreciate that! Would at least show me that I did not go crazy staring at this thing. ;) – Zakum Mar 03 '12 at 00:01
  • I ran them both and the results are different. the page with the POST is ~38k, and the page with the GET is ~24k. – Not_a_Golfer Mar 03 '12 at 00:17
0

Try removing the trailing slash from your URL like this:

url = 'http://www.mitfahrgelegenheit.de/mitfahrzentrale/Dresden/Potsdam.html'

It may be the case that the server script you're POST request is being sent to doesn't actually support POST requests.

sholsapp
  • 15,542
  • 10
  • 50
  • 67
  • Removing the trailing slash did not help (and doesn't seem to be a good idea according to http://stackoverflow.com/a/3239251/978912). Without the User-Agent header the server won't talk to me (responding with a 403) because it apparently does not like the default agent urllib2 submits. – Zakum Mar 02 '12 at 23:50
  • In your case, removing the trailing slash is correct because you've qualified the absolute path to the resource (assuming that `Potsdam.html` is a file and not a directory). – sholsapp Mar 03 '12 at 00:17
  • Ahh, thank you for the explanation! To be honest in my despair I even tried out a trailing question mark, which did not help, either. – Zakum Mar 03 '12 at 00:25