0

I am new to python, trying to do some scraping through an .aspx form. When I execute this code, I get an error. Im using Python 3.4.2.

 import urllib
 from bs4 import BeautifulSoup
 import urllib.request
 from urllib.request import urlopen

 headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Origin': 'http://www.indiapost.gov.in',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': 'http://www.indiapost.gov.in/pin/',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
 }

 class MyOpener(urllib.request.FancyURLopener):
version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'


 myopener = MyOpener()
 url = 'http://legistar.council.nyc.gov/Legislation.aspx'
 # first HTTP request without form data
 f = myopener.open(url)
 soup = BeautifulSoup(f)

 #vstate = soup.select("#__VSTATE")[0]['value']
 viewstate = soup.select("#__VIEWSTATE")[0]['value']
 eventvalidation = soup.select("#__EVENTVALIDATION")[0]['value']

 formFields = (
    (r'__VSTATE', r''),
    (r'__VIEWSTATE', viewstate),
    (r'__EVENTVALIDATION', eventvalidation),
    (r'ctl00_RadScriptManager1_HiddenField', ''), 
    (r'ctl00_tabTop_ClientState', ''), 
    (r'ctl00_ContentPlaceHolder1_menuMain_ClientState', ''),
    (r'ctl00_ContentPlaceHolder1_gridMain_ClientState', ''),
    (r'ctl00$ContentPlaceHolder1$chkOptions$0', 'on'),  # file number
    (r'ctl00$ContentPlaceHolder1$chkOptions$1', 'on'),  # Legislative text
    (r'ctl00$ContentPlaceHolder1$chkOptions$2', 'on'),  # attachement
    (r'ctl00$ContentPlaceHolder1$txtSearch', 'york'),   # Search text
    (r'ctl00$ContentPlaceHolder1$lstYears', 'All Years'),  # Years to include
    (r'ctl00$ContentPlaceHolder1$lstTypeBasic', 'All Types'),  #types to include
    (r'ctl00$ContentPlaceHolder1$btnSearch', 'Search Legislation')  # Search button itself
 )

encodedFields = urllib.parse.urlencode(formFields)
# second HTTP request with form data
f = myopener.open(url, encodedFields)

try:
# actually we'd better use BeautifulSoup once again to
# retrieve results(instead of writing out the whole HTML file)
# Besides, since the result is split into multipages,
# we need send more HTTP requests
fout = open('tmp.html', 'wb')
 except:
print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()

This script returns no results.

How do I make the script search the form and return the results?

DJ Howarth
  • 562
  • 2
  • 12
  • 30

1 Answers1

0

As Andrei mentioned in the comments, you're going to need to import urllib, but you're probably going to have other problems with your code because you're hardcoding __VIEWSTATE and __EVENTVALIDATION.

Hui Zheng did a good job explaining this, which helped me figure it out, so I'll just link to his answer rather than try to paraphrase it.

Community
  • 1
  • 1
jgysland
  • 345
  • 2
  • 10
  • I got this code from here: http://stackoverflow.com/questions/1480356/how-to-submit-query-to-aspx-page-in-python I have no clue what __VIEWSTATE & __EVENTVALIDATION actually do, I am currently researching it – DJ Howarth Dec 01 '14 at 16:25