19

I am processing a huge dataset (50 million rows) in CSV. I am trying to slice it and save it as Feather Format in order to save some memory while loading the feather format later.

As a workaround, I loaded the data in chunks as CSV file and later merged it into one data frame.

This is what I have tried so far:

df[2000000:4000000].to_feather('name')

I have got the following error:

ValueError: feather does not support serializing a non-default index for the index; you can .reset_index() to make the index into column(s)

Then I tried to reset the index but still, I get the same error.

MKJ
  • 499
  • 1
  • 7
  • 20
  • 9
    when you reset the index did you add the `inplace=True` argument? You do not actually change your df by doing `df.reset_index()` – d_kennetz Sep 06 '18 at 19:54
  • 3
    I had the same problem and a reset index fixed it, but as d_kennetz says, you have to either do it in place or assign the result back to your data frame. – Steven Mar 16 '19 at 19:26
  • This seems to be like a bug, I would suggest reporting it on github at https://github.com/wesm/feather/issues – srishtigarg Dec 31 '20 at 08:27

2 Answers2

5

Try with .loc :

df.loc[2000000:4000000].reset_index().to_feather("./myfeather.ftr")

You'll have to reset the indexes to save the datataframe to feather format. Works for me.

Lue Mar
  • 442
  • 7
  • 10
0

Save the required slice of the data to CSV df.to_csv(), load the data again from the CSV and then save to feather format. This method worked for me

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • 2
    This seems like a roundabout way to do `df.reset_index()`, it will also take some time for very large dataframes which feather is trying to avoid – Patrick Stetz Feb 09 '20 at 22:49