18

What is the suggested way to iterate over the rows in pandas like you would in a file? For example:

LIMIT = 100
for row_num, row in enumerate(open('file','r')):
    print (row)
    if row_num == LIMIT: break

I was thinking to do something like:

for n in range(LIMIT):
    print (df.loc[n].tolist())

Is there a built-in way to do this though in pandas?

David542
  • 104,438
  • 178
  • 489
  • 842

6 Answers6

41

Hasn't anyone answered the simple solution?

for row in df.head(5).itertuples():
    # do something

Take a peek at this post.

knh190
  • 2,744
  • 1
  • 16
  • 30
12

I know others have suggested iterrows but no-one has yet suggested using iloc combined with iterrows. This will allow you to select whichever rows you want by row number:

for i, row in df.iloc[:101].iterrows():
    print(row)

Though as others have noted if speed is essential an apply function or a vectorized function would probably be better.

>>> df
     a    b
0  1.0  5.0
1  2.0  4.0
2  3.0  3.0
3  4.0  2.0
4  5.0  1.0
5  6.0  NaN
>>> for i, row in df.iloc[:3].iterrows():
...     print(row)
... 
a    1.0
b    5.0
Name: 0, dtype: float64
a    2.0
b    4.0
Name: 1, dtype: float64
a    3.0
b    3.0
Name: 2, dtype: float64
>>>
user3062260
  • 1,584
  • 4
  • 25
  • 53
3

You have values, itertuples and iterrows out of which itertuples performs best as benchmarked by fast-pandas.

enter image description here

meW
  • 3,832
  • 7
  • 27
  • 1
    @timgeb perhaps you can show each of the three approaches in code and I can answer your question? – David542 Dec 20 '18 at 16:58
  • friendly ping: @timegb you can edit my answer further if you feel it is incomplete. I tried helping from my end. :) – meW Dec 20 '18 at 16:59
  • 1
    @meW no worries, I could have made myself clearer. Reminding us which method is faster is valuable, but it does not explain how to iterate only the first N rows. – timgeb Dec 20 '18 at 17:10
  • 1
    @timgeb I'll ensure answer completeness from next time :) – meW Dec 20 '18 at 17:13
2

You can use iterools.islice to take the first n items from iterrows:

import itertools
limit = 5
for index, row in itertools.islice(df.iterrows(), limit):
    ...
Joe Halliwell
  • 1,155
  • 6
  • 21
1

If you must iterate over the dataframe, you should use the iterrows() method:

for index, row in df.iterrows():
    ...
Tim
  • 2,756
  • 1
  • 15
  • 31
  • thanks, can you limit it within the `iterrows()` or do you need to use the `limit` approach? – David542 Dec 20 '18 at 16:53
  • You'll need to use a limit approach in one form or another. Because `iterrows` returns a generator, you can call the `next` method `N` times to take the first N rows. – Tim Dec 20 '18 at 16:54
1

Since you said that you want to use something like an if I would do the following:

limit = 2
df = pd.DataFrame({"col1": [1,2,3], "col2": [4,5,6], "col3": [7,8,9]})
df[:limit].loc[df["col3"] == 7]

This would select the first two rows of the data frame, then return the rows out of the first two rows that have a value for the col3 equal to 7. Point being you want to use iterrows only in very very specific situations. Otherwise, the solution can be vectorized.

I don't know what exactly are you trying to achieve so I just threw a random example.

gorjan
  • 5,405
  • 2
  • 20
  • 40