1

I am new to selenium and I want to scrape data from https://www.nasdaq.com/market-activity/stocks/aapl I am particularly interested in data from Summary Data section.

As an example, I want to scrape the following data:

  1. Exchange: NASDAQ-GS
  2. Sector: Technology
  3. Industry: Computer Manufacturing

Here is the part of HTML code from the table that I want to extract:

<table class="summary-data__table" role="table">
  <thead class="visually-hidden" role="rowgroup">
    <tr role="row">
      <th role="columnheader" scope="col">Label</th>
      <th role="columnheader" scope="col">Value</th>
    </tr>
  </thead>
  <tbody class="summary-data__table-body" role="rowgroup"><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Exchange</td><td role="cell" class="summary-data__cell">NASDAQ-GS</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Sector</td><td role="cell" class="summary-data__cell">Technology</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Industry</td><td role="cell" class="summary-data__cell">Computer Manufacturing</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">1 Year Target</td><td role="cell" class="summary-data__cell">$275.00</td>
    </tr><tr class="summary-data__row" role="row" data-first-five="true" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Today's High/Low</td><td role="cell" class="summary-data__cell">$271.00/$267.30</td>
    </tr><tr class="summary-data__row" role="row" data-first-ten="true">
      <td role="cell" class="summary-data__cellheading">Share Volume</td><td role="cell" class="summary-data__cell">26,547,493</td>
    </tr></tbody>
</table>

This is the Python code that I have so far:

driver = webdriver.Chrome(executable_path='chromedriver.exe')
driver.get('https://www.nasdaq.com/market-activity/stocks/aapl')
time.sleep(20)

elements = driver.find_element_by_class_name("summary-data__table")

I am stuck as I can't iterate through the table using the code above.

Dancoding
  • 49
  • 4
  • Welcome to Stack Overflow. The issue is that your selector is only selecting an element that is showing up once. If you are looking to gather everything in the summary data table, you can do something like this: `driver.find_elements_by_css_selector(".summary-data__table .summary-data__row")` – Lewis Menelaws Dec 09 '19 at 00:24

3 Answers3

1

Your code uses find_element_by_class_name which will only return one element and needs one class name. You should use find_elements_by_css_selector. This will select all elements and do it with a more specific CSS query. You can read more here if you are interested.

Change your code to this: elements = driver.find_elements_by_css_selector(".summary-data__table .summary-data__row")

This will go to all rows within the summary data row.

From there, you will be able to loop through all elements and do a subquery (key / value of each).

Lewis Menelaws
  • 1,186
  • 5
  • 20
  • 42
1

To scrape the NASDAQ-GS, Technology and Computer Manufacturing fields you need to scrollIntoView() the desired elements and then induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS_SELECTOR:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver.get("https://www.nasdaq.com/market-activity/stocks/aapl")
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.summary-data__header>h2.module-header"))))
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr td:nth-child(2)"))).text)
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr:nth-child(2) td:nth-child(2)"))).text)
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "tbody.summary-data__table-body>tr:nth-child(3) td:nth-child(2)"))).text)
    driver.quit()
    
  • Using XPATH:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    driver.get("https://www.nasdaq.com/market-activity/stocks/aapl")
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.summary-data__header>h2.module-header"))))
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']/tr//following-sibling::td[2]"))).get_attribute("innerHTML"))
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']//following-sibling::tr[1]//following-sibling::td[2]"))).get_attribute("innerHTML"))
    print(WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, "//tbody[@class='summary-data__table-body']//following-sibling::tr[2]//following-sibling::td[2]"))).get_attribute("innerHTML"))
    
  • Console Output:

    NASDAQ-GS
    Technology
    Computer Manufacturing
    
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
0
import requests


r = requests.get(
    'https://api.nasdaq.com/api/quote/AAPL/summary?assetclass=stocks').json()

for key, value in r['data']['summaryData'].items():
    print("{:<20} {}".format(key, value['value']))
Exchange             NASDAQ-GS
Sector               Technology
Industry             Computer Manufacturing
OneYrTarget          $275.00
TodayHighLow         $271.00/$267.30
ShareVolume          26,547,493
AverageVolume        24,634,815
PreviousClose        $265.58
FiftTwoWeekHighLow   $268.25/$142.00
MarketCap            1,202,836,268,150
PERatio              22.84
ForwardPE1Yr         20.15
EarningsPerShare     $11.85
AnnualizedDividend   $3.08
ExDividendDate       Nov 7, 2019
DividendPaymentDate  Nov 14, 2019
Yield                1.17669%
Beta                 1.02
  • Your response doesn't answer the question. Although an API is better to retrieve information 9 times out of 10, OP might require to use Selenium based on their project requirements. Also, this JSON response doesn't have fields that is in the summary data that OP is looking for. – Lewis Menelaws Dec 09 '19 at 00:32
  • @Lewis maybe you need to run the code to figure out the output ? – αԋɱҽԃ αмєяιcαη Dec 09 '19 at 00:37
  • Your post was edited since my comment, despite this, your answer is efficient but doesn't answer the question. – Lewis Menelaws Dec 09 '19 at 00:40
  • @Lewis my post edited after your comment for the location of data. review the past edit to see it's the same link and same details . only i did accessed the dict. Anyway the opinion is based on `OP`. – αԋɱҽԃ αмєяιcαη Dec 09 '19 at 00:41
  • 1
    @αԋɱҽԃαмєяιcαη How did you find the API link so quickly? I spent many hours looking for it but couldn't find any! I searched in Google and skimmed through the website many times. Do you have any trick for finding API quickly? – Dancoding Dec 09 '19 at 02:21