0

I am writing a web scraper to pull financial data and analyst recommendations. I have an issue where the data seems to be missing / incorrect form the node when I Extract the data from the page source code I get $0.00 but The correct value is $884.23

Here is the example code below:

import requests as rq
from bs4 import BeautifulSoup as bs

sym='cmg'
url='https://www.nasdaq.com/market-activity/stocks/{}/analyst-research'.format(sym)
page_response = rq.get(url, timeout=5)
page=bs(page_response.content, 'html.parser')
sr=page.find('div', attrs={'class':'analyst-target-price__price'})

print(sr.text)
Out[546]: '$0.00'

From the html code on the site the value should be $884.23 at the time of writing this question.

Like I was saying above I assume the issue is the site was not fully rendered when I got the page response / content. Does anyone have a solution to this ?

Ahmed Soliman
  • 1,662
  • 1
  • 11
  • 16
Charco
  • 31
  • 6
  • Can you share the relevant HTML source, as it appears in your program? – AMC Mar 30 '20 at 01:23
  • 1
    Stop scraping sites, and just pay for the API. – jarmod Mar 30 '20 at 01:25
  • Update: The issue is indeed that the content is dynamically generated, a common issue, and I have just the duplicate for the job. – AMC Mar 30 '20 at 01:31
  • Does this answer your question? [Web scraping program cannot find element which I can see in the browser](https://stackoverflow.com/questions/60904786/web-scraping-program-cannot-find-element-which-i-can-see-in-the-browser) – AMC Mar 30 '20 at 01:31

1 Answers1

3

the value you are trying to scrape is being genrated by Javascript so it's not in the source code of the page . You can get the same value by sending the same request js is making :

import requests as rq

sym            = 'cmg'
url            = 'https://api.nasdaq.com/api/analyst/{}/targetprice'.format(sym)
page_response  = rq.get(url).json()
priceTarget    = page_response['data']['consensusOverview']['priceTarget']
lowPriceTarget = page_response['data']['consensusOverview']['lowPriceTarget']
highPriceTarget = page_response['data']['consensusOverview']['highPriceTarget']

print('priceTarget',priceTarget)
print('lowPriceTarget ',lowPriceTarget )
print('highPriceTarget ',highPriceTarget )

OutPut:

priceTarget 884.23
lowPriceTarget  550.0
highPriceTarget  1050.0
Ahmed Soliman
  • 1,662
  • 1
  • 11
  • 16
  • _the value you are trying to scrape is being genrated by Javascript_ Can you share the relevant HTML, I can't find it in the page. – AMC Mar 30 '20 at 01:24
  • @AMC have a look at the answer . if you viewed the source code of the page you won't find the value there , that's what I mean by it's being generated by JS through an AJAX call . – Ahmed Soliman Mar 30 '20 at 01:29
  • _if you viewed the source code of the page you won't find the value there , that's what I mean by it's being generated by JS through an AJAX call ._ Sorry if my comment wasn't clear, I meant that I couldn't even find the div with class `'analyst-target-price__price'` in my browser. It turns out I needed to scroll down the page in order to get it to load. – AMC Mar 30 '20 at 01:34
  • @AMC Exactly anything that loads on scroll is being generated by JS . – Ahmed Soliman Mar 30 '20 at 01:39
  • @AhmedSoliman Thanks a lot ! worked out perfectly. How did you find the url for the request js is making (sorry kinda new to webscrapping and html) – Charco Mar 30 '20 at 04:37
  • @Charco accept the answer if it helped you , Through browser developer tools -> network tab and monitor the requests the website is making and for js requests -> XHR tab – Ahmed Soliman Mar 30 '20 at 07:43