90

I have a Pandas Data Frame object that has 1000 rows and 10 columns. I would simply like to slice the Data Frame and take the first 10 rows. How can I do this? I've been trying to use this:

>>> df.shape
(1000,10)
>>> my_slice = df.ix[10,:]
>>> my_slice.shape
(10,)

Shouldn't my_slice be the first ten rows, ie. a 10 x 10 Data Frame? How can I get the first ten rows, such that my_slice is a 10x10 Data Frame object? Thanks.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
turtle
  • 7,533
  • 18
  • 68
  • 97

5 Answers5

137

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html?highlight=head#pandas.DataFrame.head

df2 = df.head(10)

should do the trick

RuiDC
  • 8,403
  • 7
  • 26
  • 21
103

You can also do as a convenience:

df[:10]

Wes McKinney
  • 101,437
  • 32
  • 142
  • 108
17

There are various ways to do that. Below we will go through at least three options.

In order to keep the original dataframe df, we will be assigning the sliced dataframe to df_new.

At the end, in section Time Comparison we will show, using a random dataframe, the various times of execution.


Option 1

df_new = df[:10] # Option 1.1

# or

df_new = df[0:10] # Option 1.2

Option 2

Using head

df_new = df.head(10)

For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n] [Source].


Option 3

Using iloc

df_new = df.iloc[:10] # Option 3.1

# or

df_new = df.iloc[0:10] # Option 3.2

Time Comparison

For this specific case one has used time.perf_counter() to measure the time of execution.

       method                   time
0  Option 1.1 0.00000120000913739204
1  Option 1.2 0.00000149995321407914
2    Option 2 0.00000170001294463873
3  Option 3.1 0.00000120000913739204
4  Option 3.2 0.00000350002665072680

enter image description here

As there are various variables that might affect the time of execution, this might change depending on the dataframe used, and more.


Notes:

  • Instead of 10 one can replace the previous operations with the number of rows one wants. For example

    df_new = df[:5]
    

    will return a dataframe with the first 5 rows.

  • There are additional ways to measure the time of execution. For additional ways, read this: How do I get time of a Python program's execution?

  • One can also adjust the previous options to a lambda function, such as the following

    df_new = df.apply(lambda x: x[:10])
    
    # or
    
    df_new = df.apply(lambda x: x.head(10))
    

    Note, however, that there are strong opinions on the usage of .apply() and, for this case, it is far from being a required method.

Gonçalo Peres
  • 11,752
  • 3
  • 54
  • 83
14

df.ix[10,:] gives you all the columns from the 10th row. In your case you want everything up to the 10th row which is df.ix[:9,:]. Note that the right end of the slice range is inclusive: http://pandas.sourceforge.net/gotchas.html#endpoints-are-inclusive

Daniel
  • 26,899
  • 12
  • 60
  • 88
4

DataFrame[:n] will return first n rows.

Maifee Ul Asad
  • 3,992
  • 6
  • 38
  • 86
Shifu
  • 41
  • 2