Check if the host allow to scrawl.
curl http://www.etnet.com.hk/robots.txt |grep warrants
Allow: /www/tc/warrants/
Allow: /www/tc/warrants/realtime/
Allow: /www/sc/warrants/
Allow: /www/sc/warrants/realtime/
Allow: /www/eng/warrants/
Allow: /www/eng/warrants/realtime/
Allow: /mobile/tc/warrants/
Target webpage to scrawl with urllib post method.
Encounter a issue when to send post request with cookie----urllib.error.HTTPError: HTTP Error 503: Service Unavailable
send post request with cookie
I have checked request header and parameters with firefox.
Now construct my post request with cookie.
import urllib.parse
import urllib.request as req
import http.cookiejar as cookie
cookie_jar = cookie.CookieJar()
opener = req.build_opener(req.HTTPCookieProcessor(cookie_jar))
req.install_opener(opener)
url = "http://www.etnet.com.hk/www/sc/warrants/search_warrant.php"
params = {
"underasset":"HSI",
"buttonsubmit":"搜寻",
"formaction":"submitted"
}
headers = {
'Accept':"text/htmlpplication/xhtml+xmlpplication/xml;q=0.mage/webp,*/*;q=0.8",
'Accept-Encoding':"gzip, deflate",
'Accept-Language':"en-US,en;q=0.5",
'Connection':'keep-alive',
'Content-Length':'500',
'Content-Type':'application/x-www-form-urlencoded',
"Host":"www.etnet.com.hk",
"Origin":"http://www.etnet.com.hk",
"Referer":"http://www.etnet.com.hk/www/sc/warrants/search_warrant.php",
"Upgrade-Insecure-Requests":"1",
"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0"
}
query_string = urllib.parse.urlencode(params)
data = query_string.encode()
cookie_req = req.Request(url, headers=headers, data=data,method='POST')
page = req.urlopen(cookie_req).read()
I encounter a issue when to execute the above code:
urllib.error.HTTPError: HTTP Error 503: Service Unavailable
Please find out the bug in my code,and how to fix it? @NicoNing,the last issue is to count how many bytes the headers contain.
>>> s="""'Accept':'text/htmlpplication/xhtml+xmlpplication/xml;q=0.mage/webp,*/*;q=0.8',\
... 'Accept-Encoding':'gzip, deflate',\
... 'Accept-Language':'en-US,en;q=0.5',\
... 'Connection':'keep-alive',\
... 'Content-Type':'application/x-www-form-urlencoded',\
... 'Content-Length':'495',\
... 'Host':'www.etnet.com.hk',\
... 'Origin':'http://www.etnet.com.hk',\
... 'Referer':'http://www.etnet.com.hk/www/sc/warrants/search_warrant.php',\
... 'Upgrade-Insecure-Requests':'1',\
... 'User-Agent':'Mozilla/5.0 (X11; Linux x86_64; rv:74.0) Gecko/20100101 Firefox/74.0'"""
>>> len(s)
495
It can't get proper request with the above headers,if i do write the content-length in request's headers,how to assign a value as Content-Length
then?