When iterating through rows in a dataframe in Pandas, is there a difference in performance between using:
for index in df.index:
....
And:
for index, row in df.iterrows():
....
? Which one should be preferred?
When iterating through rows in a dataframe in Pandas, is there a difference in performance between using:
for index in df.index:
....
And:
for index, row in df.iterrows():
....
? Which one should be preferred?
Pandas is significantly faster for column-wise operations so consider transposing your dataset and carrying out whatever operation you want. If you absolutely need to iterate through rows and want to keep it simple, you can use
for row in df.itertuples():
print(row.column_1)
df.itertuples
is significantly faster than df.iterrows()
and iterating over the indices. However, there are faster ways to perform row-wise operations. Check out this answer for an overview.
When we doing for loop , look up index get the data require additional loc
for index in df.index:
value = df.loc['index','col']
When we do df.iterrows
for index, row in df.iterrows():
value = row['col']
Since you already with pandas , both of them are not recommended. Unless you need certain function and cannot be vectorized.
However, IMO, I preferred df.index