0

I have a large dataset of almost 4 million records. I'd like to export them in excel but since each sheets of an excel file can contain only 1 million records, I decided to split the dataframe and put each subset into an excel sheet. I used the below code:

df_split = np.array_split(promotion1, 4)
for i in df_split:
i.to_excel("result_promotion1.xlsx", index = False, sheet_name = i)

but that raised the below error: "'DataFrame' objects are mutable, thus they cannot be hashed"

any help would be appreciated.

mpy
  • 622
  • 9
  • 23

1 Answers1

0

The issue is with sheet_name = i. The sheet_name argument is expecting a string, but you're passing it the whole dataframe that you're trying to output to Excel.

The easiest way to resolve would probably to omit the argument and use the defaults (Sheet1, Sheet2, etc.). Alternatively, you could use enumerate to easily number the dataframes and split them into several excel files like so:

df_split = np.array_split(promotion1, 4)
for index, i in enumerate(df_split):
    filename = "result_promotion" + str(index) + ".xlsx"
    i.to_excel(filename, index = False)

Alternatively, this post (How to save a new sheet in an existing excel file, using Pandas?) goes into how to add a new sheet to an existing Excel file using pd.ExcelWriter.

Just to explain the error: since sheet_name expects a string and you're giving it a different object, pandas will attempt to hash the object to get a unique string representation of it instead. However, since DataFrames are mutable - you can change values in it, unlike a tuple - they cannot be hashed. See this post for a more detailed explanation on why hashable objects must be immutable.

  • really appreciate your explanation – mpy Sep 01 '19 at 06:48
  • actually I get the below error: "object of type 'int' has no len()" – mpy Sep 01 '19 at 08:10
  • more specific : " File "C:\ProgramData\Anaconda3\lib\site-packages\xlsxwriter\workbook.py", line 740, in _check_sheetname if len(sheetname) > 31:" – mpy Sep 01 '19 at 08:16
  • the len of my df(promotion1) is 3,978,996. so I guess it fit in multiple excel sheets. – mpy Sep 01 '19 at 08:26
  • I have also tried to change the type of sheet_name with this code" j.to_excel("result_promotion1.xlsx", index = False, sheet_name = str(i))" but it seems that each iteration, the sheet will overwrite. therefore the final excel has only one sheet with the final splited dataframe. – mpy Sep 01 '19 at 08:59
  • Yes, sorry, just edited my answer in response. The issue was that `index` was an integer and not a string. I just changed it such that it saves a new excel file for each split. – jonathantan Sep 01 '19 at 15:15
  • Thanks, however if I could write the all in one file, but different sheets, that would be awesome. – mpy Sep 02 '19 at 03:25