0

I would like to parse a number from this website dashboard. The number is located beneath "Organic Search"

Using a simple cmd-F on the soup, I eventually realized that my soup doesn't contain this number at all. It would be great to hear suggestions on why this is.

svelandiag
  • 4,231
  • 1
  • 36
  • 72
wip
  • 17
  • 5

1 Answers1

1

This page is rendered by JavaScrip, the response will be:

enter image description here

the real data is in this url:

import requests

r = requests.get('https://us.backend.semrush.com/?key=adb79c4ec6282f461fb0e2e67aa50949&action=report&type=url_organic&currency=usd&url=https%3A%2F%2Fwww.yelp.com%2Fbiz%2Fplayground-2-0-santa-ana-3&_=1486008342774')
data = r.json()
data['organic']['traffic']
宏杰李
  • 11,820
  • 2
  • 28
  • 35
  • 1
    you can call it without `jsoncallback=jQuery21407727922755626626_1486008342773` and you get pure JSON and then you can use `json` module. – furas Feb 02 '17 at 04:45
  • @furas WoW, Thanks. – 宏杰李 Feb 02 '17 at 04:51
  • 1
    `jsoncallback` is very popular method to send data and automatically execute function assigned to `jsoncallback` - it is called `JSONP` - see [What is JSONP all about?](http://stackoverflow.com/questions/2067472/what-is-jsonp-all-about) – furas Feb 02 '17 at 04:55
  • Thanks your answer helped me understand the problem more clearly, however what is the proper way to parse this particular page: 'https://us.backend.semrush.com/?key=adb79c4ec6282f461fb0e2e67aa50949&action=report&type=url_organic&currency=usd&url=https%3A%2F%2Fwww.yelp.com%2Fbiz%2Fplayground-2-0-santa-ana-3&_=1486008342774' I am considering using PhantomJS because you mentioned js, and there seems to be some way to do it using BeautifulSoup 4 as well. I am still a beginner with DOM and web parsing in general, thanks – wip Feb 04 '17 at 02:25