0

I want to fetch few data/values from a website. I have used beautifulsoup for this and the fields are blank when I try to fetch them from my Python script, whereas when I am inspecting elements of the webpage I can clearly see the values are available in the table row data. When i saw the HTML Source I noticed its blank there too. I came up with a reason, the website is using Javascript to populate the values in their corresponding fields from its own database. If so then how can i fetch them using Python?

Biswarup Dass
  • 193
  • 1
  • 5
  • 19
  • 1
    your python script is not a browser. You need a browser (or emulate the Javascript interpreter) to run the javascript on the fetched page. – kasper Taeymans Feb 27 '15 at 12:47
  • Might be worth looking at [this question](http://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python) too. – elParaguayo Feb 27 '15 at 12:48
  • I think this is better suited for Selenium. There are python bindings for Selenium as well: https://selenium-python.readthedocs.org/ – Joe Young Feb 27 '15 at 12:54
  • You cannot scrape content which is manipulated by JS (AJAX response etc .... I had faced same issue also ...hence I had to use Selinium – Umair Ayub Mar 04 '15 at 08:00

2 Answers2

1

The Python binding for Selenium and phantomjs (if you want to use a headless browser as backend) are the appropriate tools for this job.

0

Yes, you can scrape JS data, it just takes a bit more hacking. Anything a browser can do, python can do.

If you're using firebug, look at the network tab to see from which particular request your data is coming from. In chrome element inspection, you can find this information in a tab named network, too. Just hit ctrl-F to search the response content of the requests.

If you found the right request, the data might be embedded in JS code, in which case you'll have some regex parsing to do. If you're lucky, the format is xml or json, in which case you can just use the associated builtin parser.

ToonAlfrink
  • 2,501
  • 2
  • 19
  • 19