-1

I use the code below to transfer scraped data to excel sheet. It works, but the downside is the execution time. It takes around 5-10 seconds to transfer approx 200 rows x 2 columns (400 elements) that way. I was hoping, that someone could hint me with some more efficient solution.

Python code:

driver=webdriver.Chrome(executable_path=r'C:/chromedriver.exe')     
driver.get('https://www...table')    
First = driver.find_elements_by_xpath('//table[@id="cr1"]/tbody/tr/td[1]')
Second = driver.find_elements_by_xpath('//table[@id="cr1"]/tbody/tr/td[2]')
xlapp = win32.Dispatch('Excel.Application')
wbook = xlapp.Workbooks.Open(r'Test.xlsm')
sheet = wbook.Worksheets('COMM')
for i in range(len(First)):
    sheet.Cells(i,1).Value = First[i]
    sheet.Cells(i,2).Value = Second[i] 
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
Tom
  • 1
  • Look into using pandas, I would also recommend writing to a CSV file and not excel. Only if it's the data you are after and not formatting or formulas – Luke Hamilton Jan 13 '22 at 15:14

1 Answers1

0

An efficient approach would be to scrape the data from the table and using DataFrame and pandas write the data within an CSV/XLS/XLSX file as follows:

driver=webdriver.Chrome(executable_path=r'C:/chromedriver.exe')     
driver.get('https://www...table')    
First = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[@id="cr1"]/tbody/tr/td[1]')))]
Second = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[@id="cr1"]/tbody/tr/td[2]')))]
df = pd.DataFrame(data=list(zip(First, Second)), columns=['First', 'Second'])
print(df)
df.to_excel(r'C:\Data_Files\output_files\Test.xlsx', index=False)
driver.quit()
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Thank a lot for response. As I can see execution time of single for a column with approx 200 values: First = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[@id="cr1"]/tbody/tr/td[1]')))] is about 3-4seconds, so I'd end up with similar total execution time. If I see it correctly the text extraction from webelement seems to be so time consuming. Or am I wrong here? – Tom Jan 19 '22 at 07:41
  • @Tom Without looking at the HTML and your framework/code, it would be unjustified to speak about performance. – undetected Selenium Jan 19 '22 at 08:28