-1

I try to do a web crawler,so first step is to analyze the web page. I use the urllib2.urlopen("url") to get the web page.But the web page needs loading for a while because of many js and so on.So everytime i get part of the web page.It stops me. Could anyone give me some advice.

lambda_lin
  • 19
  • 1
  • 1
    use some sort of headless browser, or browser driver, beucase urllib does not execute JS for you – dm03514 Mar 10 '14 at 13:52
  • 1
    I wonder if scrapy could solve this. – lambda_lin Mar 10 '14 at 14:01
  • http://stackoverflow.com/questions/8550114/can-scrapy-be-used-to-scrape-dynamic-content-from-websites-that-are-using-ajax, http://stackoverflow.com/questions/10647741/executing-javascript-functions-using-scrapy-in-python, https://github.com/scrapinghub/scrapyjs, http://jackliusr.blogspot.com/2013/11/scrapy-to-crawl-dynamic-contents.html – dm03514 Mar 10 '14 at 14:04
  • I answered a similar question a while back, [have a look](https://stackoverflow.com/questions/22028775/tried-python-beautifulsoup-and-phantom-js-still-cant-scrape-websites/22030553#22030553) at it. – Steinar Lima Mar 10 '14 at 14:30

1 Answers1

0

You can try PyExecJS if you want to execute js code in python. But usually running client's side is too costly for simple crawler.