As title, I need to create multiple spreadsheets into a excel file with Pandas. While this thread and this one all provided solutions, I figured my situation is a bit different. Both of the cases use something similar to:
writer = pd.ExcelWriter('output.xlsx')
DF1.to_excel(writer,'Sheet1')
DF2.to_excel(writer,'Sheet2')
writer.save()
However, the problem is that I cannot afford to keep multiple dataframes in my memory at the same time since each of them are just too big. My data can be the complicated version of this:
df = pd.DataFrame(dict(A=list('aabb'), B=range(4), C=range(6,10)))
Out: A B C
0 a 0 6
1 a 1 7
2 b 2 8
3 b 3 9
I intend to use the items ['a', 'b', 'c']
in grplist
to perform some sort of calculation and eventually generate separate spreadsheets when data['A'] == a
through c
:
data = pd.read_csv(fileloc)
grplist = [['a','b','c'],['d','e','f']]
for groups, numbers in zip(grplist, range(1, 5)):
for category in groups:
clean = data[(data['A'] == category) & (data['B'] == numbers)]['C']
# --------My calculation to generate a dataframe--------
my_result_df = pd.DataFrame(my_result)
writer = ExcelWriter('my_path_of_excel')
my_resultdf.to_excel(writer, 'Group%s_%s' % (numbers, category[:4]))
writer.save()
gc.collect()
Sadly my code does not create multiple spreadsheets as groups, numbers
are looped through. I can only get the last result in the single spreadsheet lying in my excel. What can I do?
This is my very first post here. I hope I'm following every rules so this thread can end well. If anything need to be modified or improved, please kindly let me know. Thanks for your help :)