Is there a better way to iterate over every row of a dataframe?

Question

I've doing this iteration to execute a different function for each single value of a dataframe:

being xxx a 2-col dataframe

for i in range(1, len(xxx)):
row = xxx[i-1:i]
do_something(row['value1'])
do_something_else(row['value2'])

this works fine, but I've always wondered if is there some way to make the same operation more readable

Please answer with concepts or libraries that I should check

Does this answer your question? [How to iterate over rows in a DataFrame in Pandas](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas) — Stas Buzuluk, Sep 22 '20 at 16:16
If you need to iterate over the rows of your data frame, you should seriously question whether a data frame is the best representation for your data. Almost all uses are better solved by some form of vectorization: apply a function to all rows of the data frame (i.e. let the run-time system manage your iteration). — Prune, Sep 22 '20 at 16:44

score 2 · Accepted Answer · answered Sep 22 '20 at 16:12

2

Try this:

df=pd.DataFrame([[1,2,3,4],['A','B','C','D']]).T
df.columns=['A','B']
def func(X):
    return X**2
r=map(func, df['A'])
df['A']=pd.DataFrame(r)

answered Sep 22 '20 at 16:12

Vaziri-Mahmoud

152
1
10

score 1 · Answer 2 · answered Sep 22 '20 at 16:19

1

You can apply a function along an axis of the DataFrame (rows or columns) with apply:

pandas.DataFrame.apply

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)

answered Sep 22 '20 at 16:19

Jean-Marc Billod

67
1
8

score 1 · Answer 3 · answered Sep 22 '20 at 16:19

You may also try using a lambda function along with an apply method like this:

Let's say that you have a function that converts an element to a string and then capitalizes that string.

def capitalize(cell):
    return str(cell).capitalize()

You may then apply that function on every row for a chosen column.

df["Column"].apply(lambda x: capitalize(x))

Grayrigel · Answer 4 · 2020-09-22T17:02:02.567

One potential solution is to map regular functions or lambda functions to the columns of the dataframe, which is much more faster and efficient than a loop (e.g. df.iterrows()).

Here is summary of efficient dataframe/series manipulation methods based on an answer here :

map works for Series ONLY
applymap works for DataFrames ONLY
apply works for BOTH

` Here is a toy example :

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(4, 2), columns=list('AB'))
print(df)

def square(x):
   return x**2

#mapping a lambda function
print('Result of mapping with a lambda function')
df['A'] = df['A'].map(lambda x : x**2)
print(df)

#mapping a regular function
print('Result of mapping with a regular function')
df['C']  =df['A'].map(square)
print(df)

#apply
print('Result of applymap a regular function')
df1 = df.applymap(square)
print(df1)


#apply
print('Result of applying with a regular function')
df2 = df.apply(square)
print(df2)

Output:

          A         B
0 -0.030899 -2.206942
1  0.080991  0.049431
2  1.190754 -0.101161
3  0.794870 -0.969503

Result of mapping with a lambda function
          A         B
0  0.000955 -2.206942
1  0.006560  0.049431
2  1.417894 -0.101161
3  0.631818 -0.969503

Result of mapping with a regular function
          A         B             C
0  0.000955 -2.206942  9.115775e-07
1  0.006560  0.049431  4.302793e-05
2  1.417894 -0.101161  2.010425e+00
3  0.631818 -0.969503  3.991945e-01

Result of applymap with a regular function
              A         B             C
0  9.115775e-07  4.870592  8.309735e-13
1  4.302793e-05  0.002443  1.851403e-09
2  2.010425e+00  0.010234  4.041807e+00
3  3.991945e-01  0.939936  1.593563e-01

Result of applying with a regular function
              A         B             C
0  9.115775e-07  4.870592  8.309735e-13
1  4.302793e-05  0.002443  1.851403e-09
2  2.010425e+00  0.010234  4.041807e+00
3  3.991945e-01  0.939936  1.593563e-01

Is there a better way to iterate over every row of a dataframe?

4 Answers4