2

I am trying to scrape data from this website. The drop down menus populate based data entered, so I am making multiple post requests like this:

url = 'http://59.180.234.21:85/index.aspx'

with requests.Session() as session:
    response = session.get(url)
    soup = BeautifulSoup(response.content, "html5lib")
data = {
    'ddlDistrict': '165',
    '__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value', ''),
    '__EVENTVALIDATION': soup.find('input', {'name': '__EVENTVALIDATION'}).get('value', ''),
}
response = session.post(url, data=data)
soup = BeautifulSoup(response.content, "html5lib")

data = {
    'ddlDistrict': '165',
    'ddlPS': '11',
    '__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value', ''),
    '__EVENTVALIDATION': soup.find('input', {'name': '__EVENTVALIDATION'}).get('value', ''),
}
response = session.post(url, data=data)
soup = BeautifulSoup(response.content, "html5lib")

data = {
    'ddlDistrict': '165',
    'ddlPS': '11',
    'txtRegNo':'100',
    'ddlYear': '2011',
    '__VIEWSTATE': soup.find('input', {'name': '__VIEWSTATE'}).get('value', ''),
    '__EVENTVALIDATION': soup.find('input', {'name': '__EVENTVALIDATION'}).get('value', ''),
}
response = session.post(url, data=data)

After doing this the last page has a html table with a button which I can click and look at the report. I want to be able to simulate clicking the button and getting the response which then I can parse using BS. Please let me know how to be able to do it. Sample input, District: "New Delhi Distt", Police Station:"Con.Place", FirNo:"100", Year:"2011" will give you one Fir to view. The button has the following code:

onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("DgRegist$ctl03$imgDelete", "", true, "", "", false, false))"
Turab Hassan
  • 73
  • 1
  • 7
  • 1
    "I want to be able to simulate clicking the button and getting the response(...)" - It looks like a task for [`selenium`](http://selenium-python.readthedocs.io/). Unless, of course, you could have the `url` beforehand. – Vinícius Figueiredo Aug 08 '17 at 21:06
  • Possible duplicate of [Python click button with requests](https://stackoverflow.com/questions/38393314/python-click-button-with-requests) – Vinícius Figueiredo Aug 08 '17 at 21:09

1 Answers1

4

If you can generate the http request the button is making, then you'll have the data you want. If the button is not making any requests then the data is already there somewhere and you just need to find it and parse it out.

EDIT:

In your case it's submitting the form data to a redirect to the same page. for this you would include the form data in the request to the page and it would have the resulting data in the response. For example:

import requests

headers = {
    'Origin': 'http://59.180.234.21:85',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'en-US,en;q=0.8',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Cache-Control': 'max-age=0',
    'Referer': 'http://59.180.234.21:85/index.aspx',
    'Connection': 'keep-alive',
}

data = [
  ('__EVENTTARGET', ''),
  ('__EVENTARGUMENT', ''),
  ('__LASTFOCUS', ''),
  ('__VIEWSTATE', '/wEPDwUJMTQ2MDgwNjA1D2QWAgIDD2QWAgIFD2QWBAIBD2QWCGYPZBYEAgEPZBYCAgEPEGRkFgFmZAIDD2QWAgIBDxBkZBYAZAIBD2QWBAIBD2QWAgIBDxAPFgYeDURhdGFUZXh0RmllbGQFCENpdHlOYW1lHg5EYXRhVmFsdWVGaWVsZAUIQ2l0eUNvZGUeC18hRGF0YUJvdW5kZ2QQFREMLS0tU0VMRUNULS0tDUNFTlRSQUwgRElTVFQSQ1JJTUUgQU5EIFJBSUxXQVlTEEVBU1QgREVMSEkgRElTVFQJSUdJIERJU1RUD05FVyBERUxISSBESVNUVAtOT1JUSCBESVNUVBBOT1JUSCBFQVNUIERJU1RUEE5PUlRIIFdFU1QgRElTVFQLT1VURVIgRElTVFQLU09VVEggRElTVFQQU09VVEggRUFTVCBESVNUVBBTT1VUSCBXRVNUIERJU1RUElNQRUNJQUwgQ0VMTCBESVNUVA5TUFVXICYgQyBESVNUVAlWSUdJTEFOQ0UKV0VTVCBESVNUVBURDC0tLVNFTEVDVC0tLQMxNjIDMTY0AzE2OAMxNjkDMTY1AzE2NgMxNzMDMTcyAzE3NAMxNjcDOTU1AzE3MQM5NTQDOTUzAzE2MQMxNzAUKwMRZ2dnZ2dnZ2dnZ2dnZ2dnZ2cWAQIFZAIDD2QWAgIBDxAPFgYfAAUHUFNfTmFtZR8BBQdQU19Db2RlHwJnZBAVCQwtLS1TRUxFQ1QtLS0PQkFSQUtIQU1CQSBST0FEDUNIQU5BS1lBIFBVUkkKQ09OLiBQTEFDRQpFWEguIEdST1VOC01BTkRJUiBNQVJHClBULiBTVFJFRVQKVElMQUsgTUFSRwtUVUdMQUsgUk9BRBUJDC0tLVNFTEVDVC0tLQIwMgIwNwIxMQIxMgIxNQIyMgIzNQIzNhQrAwlnZ2dnZ2dnZ2cWAQIDZAICD2QWBAIBD2QWAgIBDw8WAh4JTWF4TGVuZ3RoAgRkZAIDD2QWAgIBDxBkDxYHAgECAgIDAgQCBQIGAgcWBxAFBDIwMTcFBDIwMTdnEAUEMjAxNgUEMjAxNmcQBQQyMDE1BQQyMDE1ZxAFBDIwMTQFBDIwMTRnEAUEMjAxMwUEMjAxM2cQBQQyMDEyBQQyMDEyZxAFBDIwMTEFBDIwMTFnZGQCAw9kFgRmD2QWAgIBDxBkZBYBAgNkAgIPZBYCAgEPDxYCHgdFbmFibGVkaGRkAgMPZBYCAgEPZBYCZg9kFgICAQ9kFgQCAQ88KwALAGQCAw88KwARAgEQFgAWABYADBQrAABkGAEFCmdyZG12dGhlZnQPZ2SPDrK3c7Ukzq5Wg/XtZSQMgDzEoWpRz8kXOVH1TO1LcA=='),
  ('__EVENTVALIDATION', '/wEdAC8iT6D3HjIr+ivdq0yBTgClCsHRaAEHIr772zKgggdQ+5cM7ByNsRG4qWi12q7B1tveFDGmjlPiBn9IJO8m9jt8W1Wcqc3FqlgV9EENz1OdJenvj2TG96ujSrFeprbtr3RTWKEdLSZa5NFLztoz81urAMmLvBzV7Qyb4qeGafdxuGr4cVZnct4CZh3KKsvt+xdAs0fg094ls2+uRMaFDPjjvXQmtkg7agsuhug+xMVSXXqKkbM01pitokD3Lzhr/+Zrc1JkJBoj+hAGr8ppVSNG4Yj6XkYB+ZGeix5+udiv9J9IjbG0sujSnR9YEqeLFuIKGVNDezkrxdfUawGK33AxvjAuIFmExdxunofmSVMj2KhPcg/6G9KkHuC16bwbWAqSNP2Vcw4/0wky0Un3Ssd3cGZtjtv+8Amihean2n5uODEqvswSsIcl9+U0P3atZA9gLfz10VlY0S1jS6520f4SrEv7IkN+08PXTozm9OT6/xtTbG8qE+XuugkwabaWLRSnp8pclR+ltj186j/FXuFQADgLnY9pn1HgIJ6W1oaeYRGUECgQhKzewPcXKgm68keQY5UuqQXqAyLatchak9gZ0UXh+krR/3fyyNtTnsY2m8PCEGuPl86vYAMVmqqL9lXoXDEtci8mednFEKQQYva+qH6WXxs8JPfC5HROATEan29Lv0JBrmCBZS2sro8ULkaKOxbg8uzVwdeGr6v29r+3doU6WdnwFP0DXPL1dqxkGAcZoyyvxsCvu30nzr6m7V8lgJSWBob7Dm8GjVgW5r9J4pnX0P+2bLZvBfOH/t4fWMmWiUd3VkQPcKR+pddTuBtpJk290kZ4wQ4JdvCFsSKdBaNizvIH0xP0v3ruMbsMtxjvy3Vie7D95PeNV8/hUPt4D+GqPsOH44Eo2T+LfQkxwBWNveA+4s3aFDJlbkXzUPNrXlzDLLAaZVBaziFS2sS3u5FK3YA3jSyXSEoDlVEvjtTdVzRZn7DFyWrI8V/OY49Qu8R8qTviVpgIZnzlz1HnUusdQsXU9clbfRlGQn3F'),
  ('ddlDistrict', '165'),
  ('ddlPS', '11'),
  ('txtRegNo', '100'),
  ('ddlYear', '2011'),
  ('txRegFromDt', ''),
  ('txRegToDt', ''),
  ('txtCompNM', ''),
  ('btnSearch', 'Search'),
]

response = requests.post('http://59.180.234.21:85/index.aspx', headers=headers, data=data)

print(response.content)
>>> b'\r\n\r\n\r\n<!DOCTYPE html PUBLIC ...... FIR No.</a></td><td style="width:10%;"><a href="javascript:__doPostBack(&#39;DgRegist$ctl02$ctl03&#39;,&#39;&#39;)" style="color:Black;">Fir Year</a></td><td style="width:10%;">FIR Date</td><td style="width:15%;">\r\n                                            View FIR\r\n                                        </td>\r\n\t\t\t</tr><tr class="DataItemStyle ">\r\n\t\t\t\t<td>0100</td><td>2011</td><td>29-05-2011</td><td>\r\n                                            <input type="image" name="DgRegist$ctl03$imgDelete" id="DgRegist_ctl03_imgDelete" src="Images/print.gif" ... ... \r\n</form>\r\n</body>\r\n</html>\r\n'
B.Adler
  • 1,499
  • 1
  • 18
  • 26
  • the button has the following code `onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("DgRegist$ctl03$imgDelete", "", true, "", "", false, false))"` not sure if that means I can generate the http request or not – Turab Hassan Aug 10 '17 at 17:06
  • If you open the browsers inspect and have the network tab open when you click the button, it'll make a visible request if it makes a request. If it does not make a request to anything then the data is likely there somewhere already like it is for sites such as facebook. Looking through the source code (not the code in the inspect) would tell you if the data is somewhere in the javascript. – B.Adler Aug 10 '17 at 17:43
  • In your case it's submitting the form data to a redirect to the same page. for this you would include the form data in the request to the page and it would have the resulting data in the response. – B.Adler Aug 10 '17 at 17:48
  • Thanks for the detailed response. The above code produces a network error. its missing a comma in line 21 as well. Dont we have to make post requests step by step as selecting one drop down item populates the other rather than making one post request. – Turab Hassan Aug 15 '17 at 19:15
  • The above code gives a network error because the form is incomplete. At each step of the dropdown it adds more to the form, so yes you can make each request incrementally, but if you have all the data being submitted in the final request you should be able to just submit all the data at once to retrieve the page you want. – B.Adler Aug 15 '17 at 22:02
  • I am unable to figure out why the above code gives an error. I understand that if we have all the data we can make a single post request. I believe in the example above, you have all the data but it still gives an error and I dont understand why. help will be appreciated. using `('ddlDistrict', '165'), ('ddlPS', '11'), ('txtRegNo', '100'), ('ddlYear', '2011'),` should give atleast one report as its doing it on the browser – Turab Hassan Aug 23 '17 at 19:21
  • Have you also included the viewstate and eventvalidation? I'll update the data object to show the working version from my machine. – B.Adler Aug 23 '17 at 19:58
  • The edit I just submitted shows the one listing from the printout for the data you gave an example of. – B.Adler Aug 23 '17 at 20:07
  • thanks a lot. Just for understanding purpose, from where you are getting the values of viewstate and event validation? also the listing points to a website, 59.180.234.21:85/ReportViewer.aspx, is it possible to directly post request to this page, if yes what will be the procedure? – Turab Hassan Aug 27 '17 at 18:17
  • In your example you've got the viewstate and eventvalidation from the page, you'd just keep that bit and reintegrate it into each successive request. I'm not sure what you mean by post a request directly to the page. We're posting requests to the page in the example to get the data. Sorry for the delayed response. – B.Adler Sep 26 '17 at 15:29