0

I´m using Pandas for almost 6 months, and in my view, one of the greatest debates has been about iterating dataframes, through .iterrows() .apply() or list-comprehension to compute new data.

I was oriented many times, always when possible, to use .loc or similar accessors to write data. The problem is, when I have many conditionals, what I used to solve in one line code, I´ll need to create many lines of .iloc to fulfill data.

In a nut shell: does it pay-off to always avoid iteration and have a much longer code lines, even when the dataframes are not huge?

Does anybody recommend some articles that explain this efficiency trade-off?

Daniel Arges
  • 345
  • 3
  • 13
  • You can take a look here: (https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas "How to iterate over rows in a DataFrame in Pandas ") – ThePyGuy Mar 24 '21 at 17:01
  • 1
    Shorter code is not (always) better code... – Tomerikoo Mar 24 '21 at 17:03
  • I would begrudgingly say that for a small dataset, you can use `.iterrows` but as you probably know, it returns a Series for each row so it is substantially slower than using indexing. Keep in mind that if your DataFrame ever gets larger, the performance of your code will suffer — try running some benchmarking tests on iterrows versus other methods to get an idea about the difference in performance – Derek O Mar 24 '21 at 17:11
  • @DerekO: I guess it's better to use `.itertuples` in place of `iterrows` – Pygirl Mar 24 '21 at 17:16

1 Answers1

1

There is a great article about different ways of iterating through a dataframe, and how much time each method takes. I personally found it very helpful. Take a look: https://towardsdatascience.com/apply-function-to-pandas-dataframe-rows-76df74165ee4

Anna K
  • 111
  • 2