I try to do a web crawler,so first step is to analyze the web page. I use the urllib2.urlopen("url") to get the web page.But the web page needs loading for a while because of many js and so on.So everytime i get part of the web page.It stops me. Could anyone give me some advice.
Asked
Active
Viewed 62 times
-1
-
1use some sort of headless browser, or browser driver, beucase urllib does not execute JS for you – dm03514 Mar 10 '14 at 13:52
-
1I wonder if scrapy could solve this. – lambda_lin Mar 10 '14 at 14:01
-
http://stackoverflow.com/questions/8550114/can-scrapy-be-used-to-scrape-dynamic-content-from-websites-that-are-using-ajax, http://stackoverflow.com/questions/10647741/executing-javascript-functions-using-scrapy-in-python, https://github.com/scrapinghub/scrapyjs, http://jackliusr.blogspot.com/2013/11/scrapy-to-crawl-dynamic-contents.html – dm03514 Mar 10 '14 at 14:04
-
I answered a similar question a while back, [have a look](https://stackoverflow.com/questions/22028775/tried-python-beautifulsoup-and-phantom-js-still-cant-scrape-websites/22030553#22030553) at it. – Steinar Lima Mar 10 '14 at 14:30
1 Answers
0
You can try PyExecJS if you want to execute js code in python. But usually running client's side is too costly for simple crawler.

user3401858
- 86
- 5