0

I want to scrape only the business summary of a company from yahoo finance page eg https://in.finance.yahoo.com/quote/AAPL?p=AAPL Business summary is the information below the company profile on the webpage on the right side. I can see that it is defined in a class of 'p'. but it is quite nested in divs. I am not able to navigate to it using beautiful soup.

I tried this

article_text = ''
article = soup.findAll('p', {'class': 'businessSummary Mt(10px) 0v(h) 
Tov(e)'})
import pdb; pdb.set_trace()
for element in article:
    article_text += '\n' + ''.join(element.findAll(text = True))
print article_text

but it is not returning me the paragraph text.

Thanks in Advance. I was not able to paste the source inspect of the webpage. Not able to format to paste it here in readable way.

Miley
  • 1
  • The class `businessSummary Mt(10px) 0v(h) Tov(e)` is not there in the source of that page.. – nijm Jul 07 '18 at 15:41
  • Is the business summary the description of the company under the "profile" tab? – Ajax1234 Jul 07 '18 at 15:41
  • The one you look for is within `script` tag. There are several `script` tags available in that page. Try `soup.find_all("script")[-3]` which will fetch you the third last `script` in which you will get necessary information you are trying to parse. If you think of any alternative then try using any browser simulator like `selenium`. – SIM Jul 07 '18 at 16:43
  • There's a big react json object on that page, you need to grab it with regex and parse it with json. – pguardiario Jul 08 '18 at 04:56
  • @SIM Can you please guide me more on selenium? – Miley Jul 08 '18 at 12:59
  • Yeah sure [check out this link](https://stackoverflow.com/questions/51233325/unable-to-get-a-dynamically-generated-content-from-a-webpage). An exact same question with two solutions. – SIM Jul 08 '18 at 16:02
  • Thank you @SIM , this helped! – Miley Jul 11 '18 at 02:56

0 Answers0