Python BeautifulSoup4 Parsing: Hidden html elements on Yahoo Finance

Question

I am analyzing the balance sheet of Amazon on Yahoo Finance. It contains nested rows, and I cannot extract all of them. The sheet looks like this:

I used BeautifulSoup4 and the Selenium web driver to get me the following output:

The following is the code:

import pandas as pd
from bs4 import BeautifulSoup
import re
from selenium import webdriver
import string
import time

# chart display specifications w/ Panda
pd.options.display.float_format = '{:.0f}'.format
pd.set_option('display.width', None)

is_link = 'https://finance.yahoo.com/quote/AMZN/balance-sheet/'

chrome_path = r"C:\\Users\\hecto\\Documents\\python\\drivers\\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get(is_link)

html = driver.execute_script('return document.body.innerHTML;')
soup = BeautifulSoup(html,'lxml')

features = soup.find_all('div', class_='D(tbr)')

headers = []
temp_list = []
label_list = []
final = []
index = 0
#create headers
for item in features[0].find_all('div', class_='D(ib)'):
    headers.append(item.text)
#statement contents
while index <= len(features)-1:
    #filter for each line of the statement
    temp = features[index].find_all('div', class_='D(tbc)')
    for line in temp:
        #each item adding to a temporary list
        temp_list.append(line.text)
    #temp_list added to final list
    final.append(temp_list)
    #clear temp_list
    temp_list = []
    index+=1
df = pd.DataFrame(final[1:])
df.columns = headers

#function to make all values numerical
def convert_to_numeric(column):
    first_col = [i.replace(',','') for i in column]
    second_col = [i.replace('-','') for i in first_col]
    final_col = pd.to_numeric(second_col)

    return final_col

for column in headers[1:]:
    df[column] = convert_to_numeric(df[column])
final_df = df.fillna('-')

print(df)

Again, I cannot seem to get all the rows of the balance sheet on my output (i.e. Cash, Total Current Assets). Where did I go wrong? Am I missing something?

score 1 · Accepted Answer · answered Jul 28 '20 at 20:27

1

You may have to click the "Expand All" button to see the additional rows. Refer to this thread to see how to simulate the click in Selenium: python selenium click on button

answered Jul 28 '20 at 20:27

Naren

107
5

1

This worked! All I did was include ```driver.find_element_by_xpath("//button[@data-reactid = '36']").click()``` before declaring the ```html``` variable. – Hector Jul 29 '20 at 00:24

Python BeautifulSoup4 Parsing: Hidden html elements on Yahoo Finance

1 Answers1