0

So I have a question; How does one get the files from a webpage and the urls attached to them. For example, Google.com

so we go to google.com and open firebug (Mozilla/chrome) and go to the "network" We then see the location of every file attached, and extension of the file.

How do I do this in python?

For url stuff, I usually look into urllib/mechanize/selenium but none of these seem to support what I want or I don't know the code that would be associated with it.

I'm using linux python 2.7 - Any help/answers would be awesome. Thank you for anyone attempting to answer this.

Edit: The things the back end servers generate, I don't know how but firebug in the "net" or "network" section show this information. I wondered if it could be implemented into python some how.

Skarlett
  • 762
  • 1
  • 10
  • 22
  • 1
    If I understanded your question you want something like [this](https://crondev.wordpress.com/2014/06/15/use-python-to-download-files-from-websites/) – jlnabais Sep 15 '15 at 15:53

2 Answers2

0

From the looks of it, you can modify the answer from here Download image file from the HTML page source using python? except modify it to find the urls in the <script> (for js) and <link> (for css) and whatever else you need.

Community
  • 1
  • 1
postelrich
  • 3,274
  • 5
  • 38
  • 65
0

It is not difficult to parse the webpage and find the links of all "attached" files such as (css, icon, js, images, etc.) which will be fetched by the browser that you can see them in the 'Network' panel.

The harder part is that some files are fetched by javascript using ajax. The only way to do that (completely and correctly) is to simulate a browser (parse html+css and run javascripts) which I don't think python can do.

Zhonghua Xi
  • 58
  • 1
  • 6