Python selenium data transfer to excel - optimal solution

Question

I use the code below to transfer scraped data to excel sheet. It works, but the downside is the execution time. It takes around 5-10 seconds to transfer approx 200 rows x 2 columns (400 elements) that way. I was hoping, that someone could hint me with some more efficient solution.

Python code:

driver=webdriver.Chrome(executable_path=r'C:/chromedriver.exe')     
driver.get('https://www...table')    
First = driver.find_elements_by_xpath('//table[@id="cr1"]/tbody/tr/td[1]')
Second = driver.find_elements_by_xpath('//table[@id="cr1"]/tbody/tr/td[2]')
xlapp = win32.Dispatch('Excel.Application')
wbook = xlapp.Workbooks.Open(r'Test.xlsm')
sheet = wbook.Worksheets('COMM')
for i in range(len(First)):
    sheet.Cells(i,1).Value = First[i]
    sheet.Cells(i,2).Value = Second[i]

Look into using pandas, I would also recommend writing to a CSV file and not excel. Only if it's the data you are after and not formatting or formulas — Luke Hamilton, Jan 13 '22 at 15:14

score 0 · Answer 1 · answered Jan 13 '22 at 22:18

0

An efficient approach would be to scrape the data from the table and using DataFrame and pandas write the data within an CSV/XLS/XLSX file as follows:

driver=webdriver.Chrome(executable_path=r'C:/chromedriver.exe')     
driver.get('https://www...table')    
First = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[@id="cr1"]/tbody/tr/td[1]')))]
Second = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[@id="cr1"]/tbody/tr/td[2]')))]
df = pd.DataFrame(data=list(zip(First, Second)), columns=['First', 'Second'])
print(df)
df.to_excel(r'C:\Data_Files\output_files\Test.xlsx', index=False)
driver.quit()

answered Jan 13 '22 at 22:18

undetected Selenium

183,867
41
278
352

Thank a lot for response. As I can see execution time of single for a column with approx 200 values: First = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[@id="cr1"]/tbody/tr/td[1]')))] is about 3-4seconds, so I'd end up with similar total execution time. If I see it correctly the text extraction from webelement seems to be so time consuming. Or am I wrong here? – Tom Jan 19 '22 at 07:41
@Tom Without looking at the HTML and your framework/code, it would be unjustified to speak about performance. – undetected Selenium Jan 19 '22 at 08:28

Python selenium data transfer to excel - optimal solution

1 Answers1