0

I'm starting to learn how to use the python requests module. For practicing I tried to manage a challenge/response problem: I want to access the data on http://lema.rae.es/drae/srv/search?val=hacer

With the "Tamper Data" plugin for Firefox I inspected the necessary HTTP requests:

GET http://lema.rae.es/drae/srv/search?val=hacer
POST http://lema.rae.es/drae/srv/search?val=hacer

I copied the exact headers that are sent by Firefox in the two HTTP requests and implemented the JavaScript "challenge" function in Python. Then I'm doing the following:

url = "http://lema.rae.es/drae/srv/search?val=hacer"
headers = { ... }
r1 = requests.get(url=url, headers=headers)
html = r1.content.decode("utf-8")
formdata = challenge(html)
headers = { ... }
r2 = requests.post(url=url, data=formdata, headers=headers)

Unfortunately, the server will not answer in the expected way. I checked all the headers I'm sending via "r.request.headers" and they agree perfectly with the headers that firefox sends (according to Tamper Data)

What am I doing wrong?

You can inspect my full code here: http://pastebin.com/7JAZ9B4s

This is the response header I should be getting:

Date[Tue, 10 Feb 2015 17:13:53 GMT]
Vary[Accept-Encoding]
Content-Encoding[gzip]
Cache-Control[max-age=0, no-cache]
Keep-Alive[timeout=5, max=100]
Connection[Keep-Alive]
Content-Type[text/html; charset=UTF-8]
Set-Cookie[TS014dfc77=017ccc203c29467c4d9b347fb56ea0e89a7182e52b9d7b4a1174efbf134768569a005c7c85; Path=/]
Transfer-Encoding[chunked]

And this is the response header I really get:

Content-Length[5798]
Content-Type[text/html]
Pragma[no-cache]
Cache-Control[no-cache]
thomas
  • 561
  • 2
  • 17
  • Found this via google. This does NOT work: https://github.com/vibragiel/glotologia/blob/master/enmiendas_drae/enmiendas-drae.py The following on the other hand suggests that we need to set a cookie: http://stackoverflow.com/questions/26952643/how-to-obtain-a-cookie-from-a-remote-domain-using-greasemonkey But for me the page also works when disabling cookies in the browser! – thomas Feb 11 '15 at 09:06
  • This works (!): https://github.com/javierhonduco/nebrija I still have to find out how it works though... – thomas Feb 11 '15 at 09:46
  • ... it always sends the same challenge string and it doesn't set any cookies or anything?! – thomas Feb 11 '15 at 09:57

1 Answers1

0

I found the reason why my code doesn't work:

The server expects the POSTDATA in exactly the same order in which the entries appear as input-elements of the form. In my code the values of the input-elements were stored in a python dict. But this data type does not preserve the order in which values have been declared!

The ruby script (referred to in the comments) however does work because the ruby dict data type seems to preserve the order of declaration!

Furthermore, reimplementing the javascript challenge() function in python was not necessary at all, because the server will be happy to accept any response string (that worked in the past) over and over again!

thomas
  • 561
  • 2
  • 17
  • One solution, i.e. one way to send the header in a predefined order, is by using urllib2.urlopen which expects a `params` parameter that contains the POSTDATA as a string (that is necessarily ordered, of course). – thomas Sep 09 '15 at 05:59