I am having trouble piecing together the last part of a puzzle. The entire code is shown below, which includes a non-essential username and password to a site where I am scraping data.
After looping through part numbers from an Excel file using
pd.read_excel()
Selenium is used to scrape various items of the website in question; the code then writes these values to the output window successfully.
As opposed to writing the data to an output window, I aim to write to the same Excel file I am pulling data from, writing it to the appropriate columns.
In the final for
loop of the code, I initially tried to write the variables (which were printing to the screen) to Excel by appending
.to_excel('filePathHere')
to the variable in question. As an example, I attempted
description.to_excel('pathToFile/output.xlsx')
Which yield an error of EOL while scanning string literal (<string>, line 1)
I then thought, maybe this variable needs to be converted to a DataFrame, so I then tried
description_DataFrame = pd.DataFrame(description)
description_DataFrame.to_excel('pathToFile/output.xlsx')
which resulted in the same error message.
I am not even sure if this is the correct logic to write each item to the existing (or new) file. If it is, I found an explanation on how to deal with long strings here: StackOverFlow EOL Error but none of my data constitutes as long strings, so I can't see how that applies.
I then start to think I might need to create a dictionary, and then append to it. So I then removed any attempts from above and tried:
description = []
description.append(mfg_part)
mfg_part.to_excel('pathToFile/output.xlsx')
Which still give me the same EOL error.
I am not to sure what is wrong, and why I can't write the variables mfg_part, mfg_OEM, description
to their respective columns in the loaded Excel file.
Any hints / tips would be greatly appreciated.
complete working code, printing to the screen is as follows:
import time
#Need Selenium for interacting with web elements
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
#Need numpy/pandas to interact with large datasets
import numpy as np
import pandas as pd
import itertools
# load in manufacture part number from a collection of components, via an Excel file
mfg_id_list = pd.read_excel("C:/Users/James/Documents/Python Scripts/jupyterNoteBooks/ScrapingData/MasterQuoteTemplate.xls")['Model']
# Create a dictionary to store product and price
# While the below works just fine, we want to create en empty pandas dataframe, so we can output to Excel later
productInfo = {}
chrome_path = r"C:\Users\James\Documents\Python Scripts\jupyterNoteBooks\ScrapingData\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.maximize_window()
driver.get("https://www.tessco.com/login")
userName = "FirstName.SurName321123@gmail.com"
password = "PasswordForThis123"
#Set a wait, for elements to load into the DOM
wait10 = WebDriverWait(driver, 10)
wait20 = WebDriverWait(driver, 20)
wait30 = WebDriverWait(driver, 30)
elem = wait10.until(EC.element_to_be_clickable((By.ID, "userID")))
elem.send_keys(userName)
elem = wait10.until(EC.element_to_be_clickable((By.ID, "password")))
elem.send_keys(password)
#Press the login button
driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()
for i in mfg_id_list:
#Expand the search bar
searchBar = wait10.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#searchBar input")))
#Enter information into the search bar
#If cell is not blank
if len(str(i)) != 0:
searchBar.send_keys(Keys.CONTROL, 'a')
searchBar.send_keys(i)
driver.find_element_by_css_selector('a.inputButton').click()
time.sleep(5)
try:
# wait for the products information to be loaded
products = wait10.until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='CoveoResult']")))
#isProductsThere = driver.find_element_by_xpath("//div[@class='CoveoResult']")
if products:
# iterate through all products in the search result and add details to dictionary
for product in products:
# get product info such as OEM, Description and Part Number
productDescr = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text
mfgPart = product.find_element_by_xpath(".//ul[@class='unlisted info']").text.split('\n')[3]
mfgName = product.find_element_by_tag_name("img").get_attribute("alt")
# There are multiple classes, some are "class sale" or else.
#We will locate by CSS
price = product.find_element_by_css_selector("div.price").text.split('\n')[1]
# add details to dictionary
productInfo[mfgPart, mfgName, productDescr] = price
# prints the searched products information
for (mfg_part, mfg_OEM, description), price in productInfo.items():
mfg_id = mfg_part.split(': ')[1]
if mfg_id == i:
#Here is where I would write to an Excel file
#And where I made attempts as described above
print('________________________________________________')
print('Part #:', mfg_id)
print('Company:', mfg_OEM)
print('Description:', description)
print('Price:', price)
print('________________________________________________')
#time.sleep(5)
#driver.close()
else:
mfg_id = "Not on Tessco"
mfg_OEM = "Not on Tessco"
description = "Not on Tessco"
price = "Not on Tessco"
#driver.close()
print("Item was not found on Tessco.com")
except Exception as e:
print('________________________________________________')
print(e)
mfg_id = "Not on Tessco"
mfg_OEM = "Not on Tessco"
description = "Not on Tessco"
price = "Not on Tessco"
#driver.close()
print("Item was not found on Tessco.com")
print('________________________________________________')
driver.close()