I have a requirement to create an excel sheet with multiple tabs. I could do it with pandas and openpyxl
def write_to_excel(file,sheet,dfM,mode='a') :
# Remove dup columns from dataframe if any.
dfM=dfM.loc[:,~dfM.columns.duplicated()]
totRows=len(dfM.index)
totCols=len(dfM.columns)
# Split sheets if more than 100K
rows_per_sheet=100000
number_of_sheets=math.floor(totRows/rows_per_sheet)+1
start_index=0
end_index=rows_per_sheet
for sheet_num in range(number_of_sheets):
df=dfM.iloc[start_index:end_index] # Splitting based on index
sheet_nme=sheet + '_' + str(sheet_num+1) # Adding the sheet number to the sheet name
# Create a Pandas Excel writer using XlsxWriter as the engine.
if sheet_num > 0 : mode='a' #Only first sheet needed mode='w'
writer=pd.ExcelWriter(file,engine='openpyxl',mode=mode,optimized_write=True,write_only=True)
# Convert the dataframe to an XlsxWriter Excel object.
df.to_excel(writer,sheet_name=sheet_nme,index=False)
# Get the xlsxwriter workbook and worksheet objects.
worksheet = writer.sheets[sheet_nme]
# loop1 to 4
# Start iterating through the columns and the rows to apply the format for header alone
##### background Coloring & Bolding of Header row for some columns ######
##### Formatting of every column and background coloring of some particular columns ######
##### font & backgorund coloring of the last summary row ######
# Finally write the file
writer.save()
writer.close()
start_index=end_index
end_index=end_index + rows_per_sheet
df='' # clearing memory
# Cleaning df those are no loner needed
dfM='' # clearing memory
df_format='' # clearing memory
return True
But problem here is that I am dealing with very large files like a million rows. And it crashes due to memory issue.
Here one of my tab will not exceed 100K rows and I am writing them at once and closing the writer. Then, starting with the new tab in the same excel sheet.
So, I am in search of a faster and memory effective method to achieve this. I am thinking of converting this to pyexcelerate to speed it up, but still not sure what to do with memory issue.
Any thoughts ?
I tried to write separate sheets and finally cobine them to an excel sheet. But no libraries in python which supports the combining of sheets from multiple excel workbooks that run on linux available right now.
I may have some 500K rows and some 275 columns in excel sheet Columns include both numeric and text columns. It works fine until 100K or so But suck after that.
I am trying with pyexcelerate right now to know , whether it will work.