I wrote a simple example. this script reads an excel file and copy from it only the rows where, in the first column, there is one of the strings specified in the list "ls". the copied rows will be paste in another excel file. below my code:
from datetime import datetime
import openpyxl
print("Current Time =", datetime.now().strftime("%H:%M:%S"))
src_wb = openpyxl.load_workbook("C:\\Users\\Admin\\Desktop\\database.xlsx")
print("Current Time =", datetime.now().strftime("%H:%M:%S"))
print("\n\n")
src_ws = src_wb[src_wb.sheetnames[0]]
dst_wb = openpyxl.Workbook()
dst_ws = dst_wb.active
dst_ws.title = "TEST"
dst_row_id, src_row_id = 0, 0
ls = ["ciao", "hola", "hello"]
my_column_id = 1
for row in src_ws.rows:
src_row_id += 1
#print("Current Time =", datetime.now().strftime("%H:%M:%S"))
for element in ls:
if src_ws.cell(src_row_id, my_column_id).value == element:
dst_row_id += 1
dst_column_id = 0
for cell in row:
dst_column_id += 1
dst_cell = dst_ws.cell(dst_row_id,column=dst_column_id)
dst_cell.value = cell.value
break
dst_wb.save("test.xlsx")
dst_wb.close()
below the output:
λ python stack.py
Current Time = 19:36:38
Current Time = 19:37:20 <- after this point the script is quite fast
this code works good, but when my source excel files are heavy, the instruction src_wb = openpyxl.load_workbook("C:\\Users\\Admin\\Desktop\\database.xlsx")
takes many times. in my example, the "database.xlsx" file has 177137 rows, and 14 columns. to open it, openpyxl takes 42 secons, but when it has to read the rows, it seems quite fast.
to improve the situation I tried to use the "read only" mode (openpyxl.load_workbook("C:\\Users\\Admin\\Desktop\\database.xlsx", read_only = True
) but in this way the script has become too slow.. it can open the "database.xlsx" file quickly, but in the "copy and paste" is very slow (it's so slow that to complete the task it needs hours!) see the output below:
λ python stack.py
Current Time = 19:37:47
Current Time = 19:37:52 <-- after this point the script is very slow
How can I improve the script performances in read only mode? or maybe in another way, I don't know.. from your experience, what is the best solution to improve my code?
@Daniel Ocando: my excel file is composed by simple strings. below an example: