I understand in general how to make a POST
request using urllib2 (encoding the data, etc.), but the problem is all the tutorials online use completely useless made-up example urls to show how to do it (someserver.com
, coolsite.org
, etc.), so I can't see the specific html that corresponds to the example code they use. Even python.org
's own tutorial is totally useless in this regard.
I need to make a POST
request to this url:
https://patentscope.wipo.int/search/en/search.jsf
The relevant part of the code is this (I think):
<form id="simpleSearchSearchForm" name="simpleSearchSearchForm" method="post" action="/search/en/search.jsf" enctype="application/x-www-form-urlencoded" style="display:inline">
<input type="hidden" name="simpleSearchSearchForm" value="simpleSearchSearchForm" />
<div class="rf-p " id="simpleSearchSearchForm:sSearchPanel" style="text-align:left;z-index:-1;"><div class="rf-p-hdr " id="simpleSearchSearchForm:sSearchPanel_header">
Or maybe it's this:
<input id="simpleSearchSearchForm:fpSearch" type="text" name="simpleSearchSearchForm:fpSearch" class="formInput" dir="ltr" style="width: 400px; height: 15px; text-align: left; background-image: url("https://patentscope.wipo.int/search/org.richfaces.resources/javax.faces.resource/org.richfaces.staticResource/4.5.5.Final/PackedCompressed/classic/org.richfaces.images/inputBackgroundImage.png"); background-position: 1px 1px; background-repeat: no-repeat;">
If I want to encode JP2014084003
as the search term, what is the corresponding value in the html to use? input id
? name
? value
?
Addendum: this answer does not answer my question, because it just repeats the information I've already looked at in the python docs page.
UPDATE:
I found this, and tried out the code in there, specifically:
import requests
headers = {'User-Agent': 'Mozilla/5.0'}
payload = {'name':'simpleSearchSearchForm:fpSearch','value':'2014084003'}
link = 'https://patentscope.wipo.int/search/en/search.jsf'
session = requests.Session()
resp = session.get(link,headers=headers)
cookies = requests.utils.cookiejar_from_dict(requests.utils.dict_from_cookiejar(session.cookies))
resp = session.post(link,headers=headers,data=payload,cookies =cookies)
r = session.get(link)
f = open('htmltext.txt','w')
f.write(r.content)
f.close()
I get a successful response (200
) but the data, once again is simply the data in the original page, so I don't know whether I'm posting to the form correctly and there's something else I need to do to get it to return the data from the search results page, or if I'm still posting the data wrong.
And yes, I realize that this uses requests
instead of urllib2
, but all I want to be able to do is get the data.