Is there an inplace version of df.head(max_rows)
in pandas?
I need to limit the number of rows in my dataframe when they are too many to be processed.
At the moment I am doing df = df.head(10000000)
but I think this is memory inefficient.
Is there an inplace version of df.head(max_rows)
in pandas?
I need to limit the number of rows in my dataframe when they are too many to be processed.
At the moment I am doing df = df.head(10000000)
but I think this is memory inefficient.
You can use pd.DataFrame.drop
for an in place operation:
n = 10000000
df.drop(df.index[n:], inplace=True)
But this may not help. As per @unutbu's comment:
df.drop(..., inplace=True)
does modifydf
inplace, but due to the wayinplace
operations are implemented in Pandas, there is no real advantage to doing this over the more straight-forward reassignment to variable names. Personally I prefer functions that return values over functions that modify values, since with the former the assignment syntax makes it utterly clear what is getting modified.
This is explained further in Jeff's answer.
In addition, note this method will not work with duplicated indices.