1
`from selenium import webdriver
import pandas as pd
import re

# Read the Excel file with the links
df = pd.read_excel('file.xlsx')

# Create empty lists to store the extracted data
company_names = []
earnings_dates = []

# Set up the Selenium driver
driver = webdriver.Chrome()

# Iterate over the links in the DataFrame
for index, row in df.iterrows():
    url = row['Link']  # Assuming the links are in column 'Link'

    # Load the URL in the browser
    driver.get(url)

    # Extract the company name using regular expressions
    try:
        html_content = driver.page_source
        match = re.search(r'<h1 class="D\(ib\) Fz\(18px\)">(.*?)</h1>', html_content)
        if match:
            company_name = match.group(1)
        else:
            company_name = 'Company name not found'
    except:
        company_name = 'Company name not found'

    # Extract the earnings date
    try:
        earnings_date_element = driver.find_element_by_xpath('//td[contains(text(), "Earnings Date")]/following-sibling::td')
        earnings_date = earnings_date_element.text.strip()
    except:
        earnings_date = 'Earnings date not found'

    # Append the extracted data to the lists
    company_names.append(company_name)
    earnings_dates.append(earnings_date)

# Close the Selenium driver
driver.quit()

# Create a new DataFrame with the extracted data
df_extracted = pd.DataFrame({'Link': df['Link'], 'Company Name': company_names, 'Earnings Date': earnings_dates})

# Print the extracted data
print(df_extracted)`

Above code I'am able to extract company name but unable to extract Earning date--

https://finance.yahoo.com/quote/A?p=A&.tsrc=fin-srch Trying to extract below result Agilent Technologies, Inc. (A) Earnings Date Aug 14, 2023 - Aug 18, 2023

1 Answers1

0

On the Yahoo Finance webpage the Company Name is within the only <h1> tag on the webpage:

CompanyName


Solution

To extract the Company Name and the Earnings Date you can use the following locator strategies:

driver.get("https://finance.yahoo.com/quote/A?p=A&.tsrc=fin-srch")
print(driver.find_element(By.CSS_SELECTOR, "h1").text)
print(driver.find_element(By.XPATH, "//td[.//span[text()='Earnings Date']]//following-sibling::td[1]").text)

Console Output:

Agilent Technologies, Inc. (A)
Aug 14, 2023 - Aug 18, 2023
  

Note : You have to add the following imports :

from selenium.webdriver.common.by import By
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352