Python, Selenium, Pandas DataFrame and Excel

Question

I am having trouble piecing together the last part of a puzzle. The entire code is shown below, which includes a non-essential username and password to a site where I am scraping data.

After looping through part numbers from an Excel file using

pd.read_excel()

Selenium is used to scrape various items of the website in question; the code then writes these values to the output window successfully.

As opposed to writing the data to an output window, I aim to write to the same Excel file I am pulling data from, writing it to the appropriate columns.

In the final for loop of the code, I initially tried to write the variables (which were printing to the screen) to Excel by appending

.to_excel('filePathHere')

to the variable in question. As an example, I attempted

description.to_excel('pathToFile/output.xlsx')

Which yield an error of EOL while scanning string literal (<string>, line 1)

I then thought, maybe this variable needs to be converted to a DataFrame, so I then tried

description_DataFrame = pd.DataFrame(description)
description_DataFrame.to_excel('pathToFile/output.xlsx')

which resulted in the same error message.

I am not even sure if this is the correct logic to write each item to the existing (or new) file. If it is, I found an explanation on how to deal with long strings here: StackOverFlow EOL Error but none of my data constitutes as long strings, so I can't see how that applies.

I then start to think I might need to create a dictionary, and then append to it. So I then removed any attempts from above and tried:

description = []
description.append(mfg_part)
mfg_part.to_excel('pathToFile/output.xlsx')

Which still give me the same EOL error.

I am not to sure what is wrong, and why I can't write the variables mfg_part, mfg_OEM, description to their respective columns in the loaded Excel file.

Any hints / tips would be greatly appreciated.

complete working code, printing to the screen is as follows:

import time
#Need Selenium for interacting with web elements
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys

#Need numpy/pandas to interact with large datasets
import numpy as np
import pandas as pd
import itertools


# load in manufacture part number from a collection of components, via an Excel file
mfg_id_list = pd.read_excel("C:/Users/James/Documents/Python Scripts/jupyterNoteBooks/ScrapingData/MasterQuoteTemplate.xls")['Model']

# Create a dictionary to store product and price
# While the below works just fine, we want to create en empty pandas dataframe, so we can output to Excel later
productInfo = {}

chrome_path = r"C:\Users\James\Documents\Python Scripts\jupyterNoteBooks\ScrapingData\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.maximize_window()
driver.get("https://www.tessco.com/login")

userName = "FirstName.SurName321123@gmail.com"
password = "PasswordForThis123"

#Set a wait, for elements to load into the DOM
wait10 = WebDriverWait(driver, 10)
wait20 = WebDriverWait(driver, 20)
wait30 = WebDriverWait(driver, 30)

elem = wait10.until(EC.element_to_be_clickable((By.ID, "userID"))) 
elem.send_keys(userName)

elem = wait10.until(EC.element_to_be_clickable((By.ID, "password"))) 
elem.send_keys(password)

#Press the login button
driver.find_element_by_xpath("/html/body/account-login/div/div[1]/form/div[6]/div/button").click()

for i in mfg_id_list:

    #Expand the search bar
    searchBar = wait10.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "#searchBar input")))

    #Enter information into the search bar
    #If cell is not blank

    if len(str(i)) != 0:
        searchBar.send_keys(Keys.CONTROL, 'a')
        searchBar.send_keys(i)
        driver.find_element_by_css_selector('a.inputButton').click()
        time.sleep(5)

        try:
            # wait for the products information to be loaded
            products = wait10.until(EC.presence_of_all_elements_located((By.XPATH,"//div[@class='CoveoResult']")))
            #isProductsThere = driver.find_element_by_xpath("//div[@class='CoveoResult']")

            if products:
                # iterate through all products in the search result and add details to dictionary
                for product in products:

                    # get product info such as OEM, Description and Part Number
                    productDescr = product.find_element_by_xpath(".//a[@class='productName CoveoResultLink hidden-xs']").text
                    mfgPart = product.find_element_by_xpath(".//ul[@class='unlisted info']").text.split('\n')[3]
                    mfgName = product.find_element_by_tag_name("img").get_attribute("alt")

                    # There are multiple classes, some are "class sale" or else.
                    #We will locate by CSS
                    price = product.find_element_by_css_selector("div.price").text.split('\n')[1]


                    # add details to dictionary
                    productInfo[mfgPart, mfgName, productDescr] = price

                # prints the searched products information   
                for (mfg_part, mfg_OEM, description), price in productInfo.items():
                    mfg_id = mfg_part.split(': ')[1]
                    if mfg_id == i:
                        #Here is where I would write to an Excel file
                        #And where I made attempts as described above
                        print('________________________________________________')
                        print('Part #:', mfg_id)
                        print('Company:', mfg_OEM)
                        print('Description:', description)
                        print('Price:', price)
                        print('________________________________________________')


                #time.sleep(5)
                #driver.close()

            else:
                mfg_id = "Not on Tessco"
                mfg_OEM = "Not on Tessco"
                description = "Not on Tessco"
                price = "Not on Tessco"
                #driver.close()
                print("Item was not found on Tessco.com")


        except Exception as e:
            print('________________________________________________')            
            print(e)
            mfg_id = "Not on Tessco"
            mfg_OEM = "Not on Tessco"
            description = "Not on Tessco"
            price = "Not on Tessco"
            #driver.close()
            print("Item was not found on Tessco.com")
            print('________________________________________________')

driver.close()

Python, Selenium, Pandas DataFrame and Excel

0 Answers0