0

I am trying to pivot a pandas dataframe (read from csv file. size 16.3 MB)

pd.pivot_table(df, index=['Column1', 'Column2'], columns=['Column3'], values=['value1', 'value2', 'value3'], aggfunc=np.sum)

This is the error that I get -

ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.

Any suggestion is highly appreciated.

eiram_mahera
  • 950
  • 9
  • 25
  • 1
    Can you attach `df.head()` to your question? – iDrwish Sep 11 '19 at 12:14
  • There is error if change `aggfunc=np.sum` to `aggfunc='sum'` ? – jezrael Sep 11 '19 at 12:14
  • @jezrael I would be very surprised it that worked, can you explain a bit on the rationale? – iDrwish Sep 11 '19 at 12:16
  • 1
    @iDrwish - I hope pandas return nicer error like numpy error ;) – jezrael Sep 11 '19 at 12:17
  • 1
    hmmm, I think problem should be too large numbers, because is used `sum` - check [this](https://stackoverflow.com/a/51621963) – jezrael Sep 11 '19 at 12:22
  • The pivot table array will have a size of `len(df.groupby(['Column1','Column2'])) * len(df.groupby(['Column3']))`, this size is obviously too big for your memory. – Stef Sep 11 '19 at 12:44
  • @Stef, yes the table is huge. Can you please suggest some alternative? – eiram_mahera Sep 12 '19 at 15:55
  • 1
    if the whole table doesn't fit into memory, may you can do it by groups of columns in parts ([dask pivot table](https://docs.dask.org/en/latest/dataframe-api.html?highlight=pivot#dask.dataframe.DataFrame.pivot_table) is of no help here as it only supports a single index column) – Stef Sep 12 '19 at 18:26

0 Answers0