1

I want to know if there is any possibility that the pandas apply function (applied row-wise) will operate on 2 rows at the same time. Say I have the following class:

class Counter():
    def __init__(self):
        self.count = 0
        self.value = None
    def add(self, value):
        self.value = value
        time.sleep(2) # because my actual function is more complicated
        self.count += self.value
        return self.value

And I use the same instance of it within an apply. Will the following operation always return the same results?

df = pd.DataFrame({'A': [1,2,3,4,5]})
counter = Counter()
assert df['A'].apply(counter.add).tolist() == [1,2,3,4,5]
assert counter.count == 15
  • Are you sure that `add()` should return `self.value` ? Shouldn't it return `self.count` instead ? – tlentali Sep 13 '21 at 09:20
  • 1
    @tlentali this is not what I actually need. Just a small example to ask the question. I expect df['A'].apply(counter.add) to preserve the order [1,2,3,4,5] and not mix. Also, the counter.count should always be 15 not lower. I do not mind if apply does row 2 before row 1. As long as they are not done together. – Helia Jamshidi Sep 13 '21 at 09:36
  • In older versions of Pandas, `apply` acts on the first row **twice** to attempt to optimize performance. This can cause double-counting for functions with side-effects like yours, so that `count` won't end up being 15 (see https://stackoverflow.com/questions/21635915/why-does-pandas-apply-calculate-twice). – Salmonstrikes Sep 13 '21 at 14:59

0 Answers0