1

Is there an inplace version of df.head(max_rows) in pandas?

I need to limit the number of rows in my dataframe when they are too many to be processed.

At the moment I am doing df = df.head(10000000) but I think this is memory inefficient.

jpp
  • 159,742
  • 34
  • 281
  • 339
Donbeo
  • 17,067
  • 37
  • 114
  • 188
  • I did not understand the point of this but `In [2]: import pandas as pd In [3]: pd.options.display.max_rows Out[3]: 60 ` – Donbeo Oct 10 '18 at 15:21
  • please look at the answer below, which is more cleaner in case you want to hardcode the rows to be display which is pandas providing cleanely – Karn Kumar Oct 10 '18 at 15:28
  • probable duplicate of [this](https://stackoverflow.com/questions/30876193/is-there-a-concise-way-to-show-all-rows-in-pandas-for-just-the-current-command?lq=1) – Karn Kumar Oct 10 '18 at 15:35

1 Answers1

1

You can use pd.DataFrame.drop for an in place operation:

n = 10000000
df.drop(df.index[n:], inplace=True)

But this may not help. As per @unutbu's comment:

df.drop(..., inplace=True) does modify df inplace, but due to the way inplace operations are implemented in Pandas, there is no real advantage to doing this over the more straight-forward reassignment to variable names. Personally I prefer functions that return values over functions that modify values, since with the former the assignment syntax makes it utterly clear what is getting modified.

This is explained further in Jeff's answer.

In addition, note this method will not work with duplicated indices.

jpp
  • 159,742
  • 34
  • 281
  • 339