0

I have to download a large number of files from a local server. When opening the URL in the browser[Firefox], the page opens with content "File being generated.. Wait.." and then the popup comes up with the option to save the required .xlsx file.

I tried to save the page object using urllib, but it saves the .html file with the content as "File being generated.. Wait..". I used the code as described here (using urllib2): How do I download a file over HTTP using Python?

I don't know how to download the file that is sent later by the server. It works fine in browser. How to emulate it using python?

Community
  • 1
  • 1
kumardeepakr3
  • 395
  • 6
  • 16

3 Answers3

1

first of all you have to know the exact URL where the document is generated. You can use firefox and the addons Http Live Headers.

And then use python to "simulate" the same request.

I hope that help.

PD: or share the url of the site and then I could help to you better.

nguaman
  • 925
  • 1
  • 9
  • 23
  • Thanks. The HTTP Live Headers add-on helped me figure out the link from where the actual file was downloaded. In my case the the server was dynamically changing the contents of a common file on server side and was redirecting me to that common link which downloaded that file. – kumardeepakr3 Nov 10 '15 at 15:16
1
import requests 
url = 'https://readthedocs.org/projects/python-guide/downloads/pdf/latest/'
myfile = requests.get(url, allow_redirects=True)
open('c:/example.pdf', 'wb').write(myfile.content)

A bit old but faced the same problem. The key to solution is in allow_redirects=True.

Michaela
  • 11
  • 2
-1

Is it as simple as

import urllib2
import time

response = urllib2.urlopen('http://www.example.com/')
time.sleep(10)  # Or however long you need.
html = response.read()
Gree Tree Python
  • 529
  • 1
  • 6
  • 22
  • Doesn't work. Results are same as previous. time.sleep(10) doesn't stop the response to have the html content rather than the .xlsx content. – kumardeepakr3 Nov 10 '15 at 13:48