`from selenium import webdriver
import pandas as pd
import re
# Read the Excel file with the links
df = pd.read_excel('file.xlsx')
# Create empty lists to store the extracted data
company_names = []
earnings_dates = []
# Set up the Selenium driver
driver = webdriver.Chrome()
# Iterate over the links in the DataFrame
for index, row in df.iterrows():
url = row['Link'] # Assuming the links are in column 'Link'
# Load the URL in the browser
driver.get(url)
# Extract the company name using regular expressions
try:
html_content = driver.page_source
match = re.search(r'<h1 class="D\(ib\) Fz\(18px\)">(.*?)</h1>', html_content)
if match:
company_name = match.group(1)
else:
company_name = 'Company name not found'
except:
company_name = 'Company name not found'
# Extract the earnings date
try:
earnings_date_element = driver.find_element_by_xpath('//td[contains(text(), "Earnings Date")]/following-sibling::td')
earnings_date = earnings_date_element.text.strip()
except:
earnings_date = 'Earnings date not found'
# Append the extracted data to the lists
company_names.append(company_name)
earnings_dates.append(earnings_date)
# Close the Selenium driver
driver.quit()
# Create a new DataFrame with the extracted data
df_extracted = pd.DataFrame({'Link': df['Link'], 'Company Name': company_names, 'Earnings Date': earnings_dates})
# Print the extracted data
print(df_extracted)`
Above code I'am able to extract company name but unable to extract Earning date--
https://finance.yahoo.com/quote/A?p=A&.tsrc=fin-srch Trying to extract below result Agilent Technologies, Inc. (A) Earnings Date Aug 14, 2023 - Aug 18, 2023