I am looking for a python script that takes the URL of a website, and that can download the complete HTML source code with css links also into my local computer where I am running my python script.
Can any one help me for this?
I am looking for a python script that takes the URL of a website, and that can download the complete HTML source code with css links also into my local computer where I am running my python script.
Can any one help me for this?
Yes, that's easy. You can use PyCurl ( python binding for curl)
But (most probably) what you will get is processed html+javascript.(ie just what a client browser reads).
As for javascript, most of the production/business websites use javascript frameworks which try to optimize the code and thus making it unreadable for humans. The same is true for HTML, many frameworks allow creating hierarchical architecture for html (extendible templates) so what you will get is a single html per page which is generated (most probably) using many (template) files, by the framework. Css is a bit simpler than the other 2 ;).
I agree with 0xc0de and Joddy. PyCurl and HTTrack can do what you want. If you're using a 'Nix OS, you can also use wget
.
Yes, it's possible. As a matter of fact, I finished writing a script that you'd described a few days ago. ;) I won't post the script here, but I'll give you some hints based on what I've done.
urllib2.urlopen
(Python 2.x) or
urllib.request.urlopen
(Python 3) for that.urllib2.urlopen
/urllib.request.urlopen
) and get all the links
you needed. You can use BeautifulSoup for this. Then download all the content stuff you need (use the same code you used to download the webpage in step 1).href
/src
to the local
path of your css/image/js files. You can use fileinput
for inplace text replacements.
Refer to this SO post for further details.That's it. Optional stuff you have to worry about are connecting/downloading from the net with proxy (if you're behind one), creating folders, and logger.
You could also use Scrapy. Check this blog post on how to crawl the website using Scrapy.