0

I am doing web Crawling in using beautiful.I get the Data from various websites but i am not getting from some of website i find that these website show data using js.

I write the following script to derived data and it works fine but not in some which is usind JS for their data.

from bs4 import BeautifulSoup
import requests
import urllib2

params = {"url":"search-alias=aps","field-keywords":"j7"}
url = "http://www.amazon.in/s/ref=nb_sb_noss"

soup = BeautifulSoup(requests.get(url, params=params).content)
ul = soup.findAll("h2" ,{"class":"a-size-medium a-color-null s-inline s-access-title a-text-normal"})
j=0
for a in ul:
   print a.contents
   print "\n"
   j=j+1

It works fine in that but some website is there they are using js to fetch data and show data so i cant get data by that way

Yash Bathia
  • 43
  • 1
  • 13

1 Answers1

3

The nature of the web is client-server: the server delivers content to the web site, and the browser displays it. This may include client-side scripting, which is JavaScript code that the browser executes, which may modify the DOM.

So, in order to pic up the modified DOM, any client (this includes your Python code) would have to create a DOM from the HTML and then execute the JavaScript to modify the DOM as the browser would have.

The answer to this question might give you some clues. Sadly since I have now noticed that answer it means this question should really be closed.

Community
  • 1
  • 1
holdenweb
  • 33,305
  • 7
  • 57
  • 77