Is it possible to get complete source code of a website including css by just providing the URL of website? + Python

Question

I am looking for a python script that takes the URL of a website, and that can download the complete HTML source code with css links also into my local computer where I am running my python script.

Can any one help me for this?

Use CURL to retrieve the page, parse the links and then retrieve the files. If you want to download the website, and need software use [HTTRACK](http://www.httrack.com/) — Joddy, Dec 13 '12 at 06:36

score 1 · Answer 1 · answered Dec 13 '12 at 06:42

Yes, that's easy. You can use PyCurl ( python binding for curl)

But (most probably) what you will get is processed html+javascript.(ie just what a client browser reads).

As for javascript, most of the production/business websites use javascript frameworks which try to optimize the code and thus making it unreadable for humans. The same is true for HTML, many frameworks allow creating hierarchical architecture for html (extendible templates) so what you will get is a single html per page which is generated (most probably) using many (template) files, by the framework. Css is a bit simpler than the other 2 ;).

score 0 · Answer 2 · edited May 23 '17 at 12:07

I agree with 0xc0de and Joddy. PyCurl and HTTrack can do what you want. If you're using a 'Nix OS, you can also use wget.

Yes, it's possible. As a matter of fact, I finished writing a script that you'd described a few days ago. ;) I won't post the script here, but I'll give you some hints based on what I've done.

Download the webpage. You can use urllib2.urlopen (Python 2.x) or urllib.request.urlopen (Python 3) for that.
Then after downloading the page, parse the source code of the downloaded page (well, you can also parse the source code online but that would mean another call to
urllib2.urlopen/urllib.request.urlopen) and get all the links you needed. You can use BeautifulSoup for this. Then download all the content stuff you need (use the same code you used to download the webpage in step 1).
Update the local page by changing all the href/src to the local path of your css/image/js files. You can use fileinput for inplace text replacements. Refer to this SO post for further details.

That's it. Optional stuff you have to worry about are connecting/downloading from the net with proxy (if you're behind one), creating folders, and logger.

You could also use Scrapy. Check this blog post on how to crawl the website using Scrapy.

Is it possible to get complete source code of a website including css by just providing the URL of website? + Python

2 Answers2

Linked