How to slice a pandas DataFrame by position?

Question

I have a Pandas Data Frame object that has 1000 rows and 10 columns. I would simply like to slice the Data Frame and take the first 10 rows. How can I do this? I've been trying to use this:

>>> df.shape
(1000,10)
>>> my_slice = df.ix[10,:]
>>> my_slice.shape
(10,)

Shouldn't my_slice be the first ten rows, ie. a 10 x 10 Data Frame? How can I get the first ten rows, such that my_slice is a 10x10 Data Frame object? Thanks.

score 137 · Accepted Answer · answered Aug 18 '12 at 20:27

137

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html?highlight=head#pandas.DataFrame.head

df2 = df.head(10)

should do the trick

answered Aug 18 '12 at 20:27

RuiDC

8,403
7
26
21

score 103 · Answer 2 · answered Sep 09 '12 at 19:21

103

You can also do as a convenience:

df[:10]

answered Sep 09 '12 at 19:21

Wes McKinney

101,437
32
142
108

This seems to not copy the column names for me. – Ruben Aug 23 '21 at 12:14

Gonçalo Peres · Answer 3 · 2022-10-27T11:50:42.207

There are various ways to do that. Below we will go through at least three options.

In order to keep the original dataframe df, we will be assigning the sliced dataframe to df_new.

At the end, in section Time Comparison we will show, using a random dataframe, the various times of execution.

Option 1

df_new = df[:10] # Option 1.1

# or

df_new = df[0:10] # Option 1.2

Option 2

Using head

df_new = df.head(10)

For negative values of n, this function returns all rows except the last n rows, equivalent to df[:-n] [Source].

Option 3

Using iloc

df_new = df.iloc[:10] # Option 3.1

# or

df_new = df.iloc[0:10] # Option 3.2

Time Comparison

For this specific case one has used time.perf_counter() to measure the time of execution.

       method                   time
0  Option 1.1 0.00000120000913739204
1  Option 1.2 0.00000149995321407914
2    Option 2 0.00000170001294463873
3  Option 3.1 0.00000120000913739204
4  Option 3.2 0.00000350002665072680

As there are various variables that might affect the time of execution, this might change depending on the dataframe used, and more.

Notes:

Instead of 10 one can replace the previous operations with the number of rows one wants. For example
```
df_new = df[:5]
```
will return a dataframe with the first 5 rows.
There are additional ways to measure the time of execution. For additional ways, read this: How do I get time of a Python program's execution?
One can also adjust the previous options to a lambda function, such as the following
```
df_new = df.apply(lambda x: x[:10])

# or

df_new = df.apply(lambda x: x.head(10))
```
Note, however, that there are strong opinions on the usage of .apply() and, for this case, it is far from being a required method.

score 14 · Answer 4 · answered Aug 19 '12 at 09:02

14

df.ix[10,:] gives you all the columns from the 10th row. In your case you want everything up to the 10th row which is df.ix[:9,:]. Note that the right end of the slice range is inclusive: http://pandas.sourceforge.net/gotchas.html#endpoints-are-inclusive

answered Aug 19 '12 at 09:02

Daniel

26,899
12
60
88

8

this is depricated – Rocketq Mar 01 '18 at 14:37
.ix is not working u shall delete it, "df[:10]" is enough – Abdelsalam Hamdi May 25 '21 at 05:51

score 4 · Answer 5 · edited Oct 25 '21 at 12:03

4

DataFrame[:n] will return first n rows.

edited Oct 25 '21 at 12:03

Maifee Ul Asad

3,992
6
38
86

answered Apr 23 '20 at 16:06

Shifu

41
2

How to slice a pandas DataFrame by position?

5 Answers5

Linked

Related