Clarification about iteration in Pandas

Question

I´m using Pandas for almost 6 months, and in my view, one of the greatest debates has been about iterating dataframes, through .iterrows() .apply() or list-comprehension to compute new data.

I was oriented many times, always when possible, to use .loc or similar accessors to write data. The problem is, when I have many conditionals, what I used to solve in one line code, I´ll need to create many lines of .iloc to fulfill data.

In a nut shell: does it pay-off to always avoid iteration and have a much longer code lines, even when the dataframes are not huge?

Does anybody recommend some articles that explain this efficiency trade-off?

You can take a look here: (https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas "How to iterate over rows in a DataFrame in Pandas ") — ThePyGuy, Mar 24 '21 at 17:01
I would begrudgingly say that for a small dataset, you can use `.iterrows` but as you probably know, it returns a Series for each row so it is substantially slower than using indexing. Keep in mind that if your DataFrame ever gets larger, the performance of your code will suffer — try running some benchmarking tests on iterrows versus other methods to get an idea about the difference in performance — Derek O, Mar 24 '21 at 17:11
@DerekO: I guess it's better to use `.itertuples` in place of `iterrows` — Pygirl, Mar 24 '21 at 17:16

score 1 · Accepted Answer · answered Mar 24 '21 at 18:45

There is a great article about different ways of iterating through a dataframe, and how much time each method takes. I personally found it very helpful. Take a look: https://towardsdatascience.com/apply-function-to-pandas-dataframe-rows-76df74165ee4

Clarification about iteration in Pandas

1 Answers1