Let us say i have an excel file with 100k rows. My code is trying to read it row by row, and for each row do computation (including benchmark of how long it takes to perform each row). Then, my code will produce an array of results, with 100k rows. I did my python code but it is not efficient and taking me several days and also the benchmark results getting worse due to high consumption of memory i guess. Please see my attempt and let me know how to improve it.
My code save results=[] and only write it at the end. Also, at the start i store the whole excel file in worksheet.. I think like this will cause memory issue since my excel has very large text in cells (not only numbers).
ExcelFileName = 'Data.xlsx'
workbook = xlrd.open_workbook(ExcelFileName)
worksheet = workbook.sheet_by_name("Sheet1") # We need to read the data
num_rows = worksheet.nrows #Number of Rows
num_cols = worksheet.ncols #Number of Columns
results=[]
for curr_row in range(1,num_rows,1):
row_data = []
for curr_col in range(0, num_cols, 1):
data = worksheet.cell_value(curr_row, curr_col) # Read the data in the current cell
row_data.append(data)
#### do computation here ####
## save results like results+=[]
### save results array in dataframe and then print it to excel
df = pd.DataFrame(results)
writer = pd.ExcelWriter("XX.xlsx", engine="xlsxwriter")
df.to_excel(writer, sheet_name= 'results')
writer.save()
What i would like is to read the first row from excel and store it in memory, do the calculation, get the result and save it into excel,, then go for the second row,,, without keep memory so busy. By doing so, i will not have results array containing 100k rows, since each loop i erase it.