Iterate over first N rows in pandas

Question

What is the suggested way to iterate over the rows in pandas like you would in a file? For example:

LIMIT = 100
for row_num, row in enumerate(open('file','r')):
    print (row)
    if row_num == LIMIT: break

I was thinking to do something like:

for n in range(LIMIT):
    print (df.loc[n].tolist())

Is there a built-in way to do this though in pandas?

Is there any particular thing that you want to do with the first N rows? The reason for asking this question is that `df.iterrows()` is really slow and should be avoided if possible (usually it can be avoided). — gorjan, Dec 20 '18 at 16:52
check this post: https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas — Jessica, Dec 20 '18 at 16:53
@gorjan I'm doing some sort of `if` logic on each of these rows. — David542, Dec 20 '18 at 16:54
@gorjan do you want to post an answer with the more performant answer? Or am I already doing it with `df.loc` ? — David542, Dec 20 '18 at 16:55
@David542 Let me know if what I posted is sufficient for you. — gorjan, Dec 20 '18 at 17:03
`df.head(100).itertuple()` as my new answer suggested would do. — knh190, Apr 02 '19 at 07:33
Would you consider un-accepting my answer so that I can delete it? Clearly knh190's is much better. — timgeb, Jul 06 '20 at 13:21

knh190 · Accepted Answer · 2019-04-02T07:43:00.957

41

Hasn't anyone answered the simple solution?

for row in df.head(5).itertuples():
    # do something

Take a peek at this post.

edited Apr 02 '19 at 07:43

answered Apr 02 '19 at 07:31

knh190

2,744
1
16
30

4

btw, `df.head(5).iterrows()` works too – oz19 Nov 03 '21 at 15:13
This is art, truly – some_programmer Mar 02 '23 at 13:00

user3062260 · Answer 2 · 2021-10-11T09:40:17.783

I know others have suggested iterrows but no-one has yet suggested using iloc combined with iterrows. This will allow you to select whichever rows you want by row number:

for i, row in df.iloc[:101].iterrows():
    print(row)

Though as others have noted if speed is essential an apply function or a vectorized function would probably be better.

>>> df
     a    b
0  1.0  5.0
1  2.0  4.0
2  3.0  3.0
3  4.0  2.0
4  5.0  1.0
5  6.0  NaN
>>> for i, row in df.iloc[:3].iterrows():
...     print(row)
... 
a    1.0
b    5.0
Name: 0, dtype: float64
a    2.0
b    4.0
Name: 1, dtype: float64
a    3.0
b    3.0
Name: 2, dtype: float64
>>>

score 3 · Answer 3 · answered Dec 20 '18 at 16:56

3

You have values, itertuples and iterrows out of which itertuples performs best as benchmarked by fast-pandas.

answered Dec 20 '18 at 16:56

meW

3,832
7
27

1

@timgeb perhaps you can show each of the three approaches in code and I can answer your question? – David542 Dec 20 '18 at 16:58
friendly ping: @timegb you can edit my answer further if you feel it is incomplete. I tried helping from my end. :) – meW Dec 20 '18 at 16:59
1

@meW no worries, I could have made myself clearer. Reminding us which method is faster is valuable, but it does not explain how to iterate only the first N rows. – timgeb Dec 20 '18 at 17:10
1

@timgeb I'll ensure answer completeness from next time :) – meW Dec 20 '18 at 17:13

score 2 · Answer 4 · answered Dec 20 '18 at 16:56

2

You can use iterools.islice to take the first n items from iterrows:

import itertools
limit = 5
for index, row in itertools.islice(df.iterrows(), limit):
    ...

answered Dec 20 '18 at 16:56

Joe Halliwell

1,155
6
21

score 1 · Answer 5 · answered Dec 20 '18 at 16:52

1

If you must iterate over the dataframe, you should use the iterrows() method:

for index, row in df.iterrows():
    ...

answered Dec 20 '18 at 16:52

Tim

2,756
1
15
31

thanks, can you limit it within the `iterrows()` or do you need to use the `limit` approach? – David542 Dec 20 '18 at 16:53
You'll need to use a limit approach in one form or another. Because `iterrows` returns a generator, you can call the `next` method `N` times to take the first N rows. – Tim Dec 20 '18 at 16:54

score 1 · Answer 6 · answered Dec 20 '18 at 17:00

Since you said that you want to use something like an if I would do the following:

limit = 2
df = pd.DataFrame({"col1": [1,2,3], "col2": [4,5,6], "col3": [7,8,9]})
df[:limit].loc[df["col3"] == 7]

This would select the first two rows of the data frame, then return the rows out of the first two rows that have a value for the col3 equal to 7. Point being you want to use iterrows only in very very specific situations. Otherwise, the solution can be vectorized.

I don't know what exactly are you trying to achieve so I just threw a random example.

Iterate over first N rows in pandas

6 Answers6