4

I am working on a screen scraping tool in Python. But, as I look through the source of the webpage, I noticed that most of the data is coming through Javascript.

Any idea, how to scrape javascript based webpage ? Any tool in Python ?

Thanks

Kiran
  • 8,034
  • 36
  • 110
  • 176

3 Answers3

5

Scraping javascript-based webpages is possible with selenium. In particular, try the Selenium WebDriver.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • I tried Selenium. I donot want to mimic the user action. As I see it from running a sample program, it opens browser window and mimics the action. I donot want that. I want to extract the data from the webpage into my code. – Kiran Nov 19 '11 at 11:18
  • 1
    You don't have to mimic user actions if you don't need to. Just download the page and parse it. The point of using selenium is that it processes javascript for you. – unutbu Nov 19 '11 at 12:28
4

I use webkit, which is the browser renderer behind Chrome and Safari. There are Python bindings to webkit through Qt.

And here is a full Python example to execute JavaScript and extract the final HTML.

hoju
  • 28,392
  • 37
  • 134
  • 178
3

You can use the QtWebKit module of the PyQt4 library

Will
  • 4,498
  • 1
  • 21
  • 20