1

I have a table created using 'div' elements, which has dynamic content based on the choice and also the data to be displayed that are generated with javascript. Html structure is like this:

<div class="container-jKD0Exn-">
<div class="shrinkShadowPosition-OFmmj-q_">
<div class="shrinkShadowWrap-OFmmj-q_">
<div class="shrinkShadow-OFmmj-q_">
</div></div></div>
<div class="titleWrap-jKD0Exn-" style="box-shadow:none">
<div class="offsetPadding-jKD0Exn-" style="width:0"></div>
<span class="title-jKD0Exn- apply-overflow-tooltip">Total common shares outstanding</span></div>
<div class="filling-jKD0Exn-"></div>
<div class="values-jKD0Exn- values-ZmRZjHnV">
<div class="value-25PNPwRV">
<div class="wrap-25PNPwRV">
<div>‪22.32B‬</div>
</div></div>
<div class="value-25PNPwRV">
<div class="wrap-25PNPwRV">
<div>‪21.34B‬</div>
</div></div>
<div class="value-25PNPwRV">
<div class="wrap-25PNPwRV"><div>‪20.50B‬</div>
</div></div>

Using below python code, result is like this: Total common shares outstanding‪22.32B‬‪21.34B‬‪20.50B‬‪19.02B‬‪17.77B‬‪16.98B‬‪16.43B‬‪16.33B‬ Instead I would it in a list or in a dtaframe like this:

['Total common shares outstanding‪',22.32,21.34,‬‪20.50B‬,19.02,17.77,‬‪16.98B‬,16.43,‬‪16.33,]

Python code I'm using to scrape data is this one:

from selenium import webdriver
import pandas as pd
import requests, bs4
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

url ='https://www.tradingview.com/symbols/NASDAQ-AAPL/financials-statistics-and-ratios/'
driver = webdriver.Chrome('chromedriver',options=options)
driver.get(url)
html = driver.page_source
#print(html)
soup = bs4.BeautifulSoup(html, 'html.parser')
for title in soup.find_all("div", {"class": "container-jKD0Exn-"}):
     print(title.text+'\n')

Is there any way in selenium or beautifulsoap to get a list like that?

Freeman
  • 13
  • 2

2 Answers2

0

Using Selenium to print the desired texts you have to induce WebDriverWait for the visibility_of_all_elements_located() and you can use the following Locator Strategy:

  • Using xpath:

    driver.get("https://www.tradingview.com/symbols/NASDAQ-AAPL/financials-statistics-and-ratios/")
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Accept']"))).click()
    df = pd.DataFrame([my_elem.text.replace('\u202a', ' ').replace('\u202c', ' ') for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[text()='Total common shares outstanding']//following::div[2]//div[starts-with(@class, 'wrap')]/div")))], columns = ['Total common shares outstanding'])
    print(df)
    driver.quit()
    
  • Console Output:

          Total common shares outstanding
    0                         22.32B
    1                         21.34B
    2                         20.50B
    3                         19.02B
    4                         17.77B
    5                         16.98B
    6                         16.43B
    7                         16.33B
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0

As one approach, if there is no api, what you should prefer to use, you can go with BeautifulSoup and stripped_strings:

data = []
for title in soup.find_all("div", {"class": "container-jKD0Exn-"}):
     data.append(list(title.stripped_strings))

pd.DataFrame(data)

Output DataFrame:

0 1 2 3 4 5 6 7 8
Key stats
Total common shares outstanding ‪22.32B‬ ‪21.34B‬ ‪20.50B‬ ‪19.02B‬ ‪17.77B‬ ‪16.98B‬ ‪16.43B‬ ‪16.33B‬
Float shares outstanding ‪22.29B‬ ‪21.32B‬ ‪20.48B‬ ‪18.99B‬ ‪17.75B‬ ‪16.96B‬ ‪16.41B‬ ‪16.32B‬
Number of employees ‪110.00K‬ ‪116.00K‬ ‪123.00K‬ ‪132.00K‬ ‪137.00K‬ ‪147.00K‬ ‪154.00K‬
Number of shareholders ‪23.50K‬ ‪23.50K‬ ‪23.50K‬ ‪23.50K‬ ‪23.50K‬ ‪23.50K‬ ‪23.50K‬
... ... ... ... ... ... ... ... ... ...
HedgeHog
  • 22,146
  • 4
  • 14
  • 36