5

How do I use python mechanize to retrieve a file from an aspnetForm submitControl that triggers an Excel file download when I don't know the file URL or file name?

URL of site with Excel file: http://www.ncysaclassic.com/TTSchedules.aspx?tid=NCFL&year=2012&stid=NCFL&syear=2012&div=U11M01

I'm trying to get the file downloaded by the Print Excel 'button'.

So far I have:

r = br.open('http://www.ncysaclassic.com/TTSchedules.aspx?tid=NCFL&year=2012&stid=NCFL&syear=2012&div=U11M01')
html = r.read()

# Show the html title
print br.title()

# Show the available forms
for f in br.forms():
    print f

br.select_form('aspnetForm')
print '\n\nSubmitting...\n'
br.submit("ctl00$ContentPlaceHolder1$btnExtractSched")

print 'Response...\n'
print br.response().info()
print br.response().read

print 'still alive...\n'

for prop, value in vars(br.response()).iteritems():
    print 'Property:', prop, ', Value: ', value

print 'myfile...\n' 

myfile = br.response().read

and I get this output:

    Submitting...

    Response...

Content-Type: application/vnd.ms-excel
Last-Modified: Thu, 27 Sep 2012 20:19:10 GMT
Accept-Ranges: bytes
ETag: W/"6e27615aed9ccd1:0"
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 27 Sep 2012 20:19:09 GMT
Connection: close
Content-Length: 691200

<bound method response_seek_wrapper.read of <response_seek_wrapper at 0x2db5248L whose wrapped object = <closeable_response at 0x2e811c8L whose fp = <socket._fileobject object at 0x0000000002D79930>>>>
still alive...

Property: _headers , Value:  Content-Type: application/vnd.ms-excel
Last-Modified: Thu, 27 Sep 2012 20:19:10 GMT
Accept-Ranges: bytes
ETag: W/"6e27615aed9ccd1:0"
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 27 Sep 2012 20:19:09 GMT
Connection: close
Content-Length: 691200

Property: _seek_wrapper__read_complete_state , Value:  [False]
Property: _seek_wrapper__have_readline , Value:  True
Property: _seek_wrapper__is_closed_state , Value:  [False]
Property: _seek_wrapper__pos , Value:  0
Property: wrapped , Value:  <closeable_response at 0x2e811c8L whose fp = <socket._fileobject object at 0x0000000002D79930>>
Property: _seek_wrapper__cache , Value:  <cStringIO.StringO object at 0x0000000002E8B0D8>

Seems I am very close...Note the Content-Type: application/vnd.ms-excel

I just don't know what to do next. Where is my file, and how do I get a pointer to it and save it locally for access later?

Update:

I used dir() to get a list of methods/attributes for the response() and then tried a couple of the methods...

print '\ndir(br.response())\n'
for each in dir(br.response()):
    print each

print '\nresponse info...\n'
print br.response().info()

print '\nresponse geturl\n'
print br.response().geturl()

and I get this output...

dir(br.response())

__copy__
__doc__
__getattr__
__init__
__iter__
__module__
__repr__
__setattr__
_headers
_seek_wrapper__cache
_seek_wrapper__have_readline
_seek_wrapper__is_closed_state
_seek_wrapper__pos
_seek_wrapper__read_complete_state
close
get_data
geturl
info
invariant
next
read
readline
readlines
seek
set_data
tell
wrapped
xreadlines

response info...

Date: Thu, 27 Sep 2012 20:55:02 GMT
ETag: W/"fa759b5df29ccd1:0"
Server: Microsoft-IIS/7.5
Connection: Close
Content-Type: application/vnd.ms-excel
X-Powered-By: ASP.NET
Accept-Ranges: bytes
Last-Modified: Thu, 27 Sep 2012 20:55:03 GMT
Content-Length: 691200


response geturl

http://www.ncysaclassic.com/photos/pdftemp/ScheduleExcel165502.xls

I think I already have this file in my br.response. I just don't know how to extract it! Please help.

hokie85
  • 51
  • 3
  • I'm getting closer it seems... – hokie85 Sep 27 '12 at 20:58
  • These both worked for me: print '\nAttempting to write file 1...\n' # found this here http://stackoverflow.com/questions/8116623/how-to-download-a-file-in-python # open("/path/to/someFile", "wb").write(urllib2.urlopen("http://someUrl.com/somePage.html").read()) open("C:\Users\gregb\Downloads\download.xls", "wb").write(br.response().read()) print '\nAttempting to write file 2...\n' open("C:\Users\gregb\Downloads\urllib2_urlopen.xls", "wb").write(urllib2.urlopen("http://www.ncysaclassic.com/photos/pdftemp/ScheduleExcel172625.xls").read()) – hokie85 Sep 27 '12 at 21:37

1 Answers1

3
# fill out the form
response = br.submit()
fileobj = open('filename', 'w+')
fileobj.write(response.read())
fileobj.close()
root
  • 76,608
  • 25
  • 108
  • 120
  • Why can't I enter a carriage return in these comments? When I do, my comment is submitted before I'm finished! – hokie85 Sep 28 '12 at 16:05
  • Lets try the two spaces suggested in the help – hokie85 Sep 28 '12 at 16:06
  • Let try again does this start a new line? – hokie85 Sep 28 '12 at 16:06
  • Try it on IE instead of Chrome – hokie85 Sep 28 '12 at 16:16
  • I was going to post my code in the comments but its difficult if I can't enter a linefeed...let me try the mark up formatting... `# THIS WORKS! # open a local file instance fileobj = open("C:\\Users\\gregb\\Downloads\\ncysa_schedule.xls", "w+") # write to it from the submit response above fileobj.write(br.response().read()) fileobj.close() # thats it! ` Do you know how to hide the gzip(True) warning? `br.set_handle_gzip(True) # this gives a warning - how to suppress it? ` br.set_handle_gzip(True) # this gives a warning - how to suppress it?` – hokie85 Sep 28 '12 at 16:20
  • Correction...since its an Excel xls file...I have to use wb – hokie85 Sep 28 '12 at 16:32