Efficiently iterating through multiple series in a pandas dataframe

Question

I have pandas dataframe

A	B	C
AA	BB	CC
AAA	BBB	CCC

Now I have a function that takes the columns B and C and returns a series based on the values of columns B and C.

I have tried something like this

  def do_magic(a: pd.Series, b: pd.Series) -> pd.Series:

    def magic(aa, bb):
      return <SOME MAGIC SPELL>

    magic_spells = []

    for aaa, bbb in zip(a, b):
      magic_spells.append(do_magic(aaa, bbb)))

    return pd.Series(magic_spells)

This works fine, But I wondering if I can improve the performance of this code with something similar to the pandas dataframe apply method.

def do_magic(dataframe):
  def magic(aa, bb):
    return <SOME MAGIC SPELL>
  return dataframe.apply(lambda row: magic(dataframe['a'], dataframe['b']), axis=1)

The second function is more performant than the first. But I can't pass the entire dataframe to the function. Actually, I am trying to build a PySpark pandas_udf function.

Any help will be appreciated.

*The second function is more performant than the first* this is not true, unless you area talking about number of characters in code. That said, replace `dataframe` by `row` inside `lambda` would work. — Quang Hoang, Mar 03 '21 at 17:43
Also, the first approach would be more performant if you remove the repeated `append`: `magic_spells = [magic(aa,bb) for aa,bb in zip(a,b)]`. — Quang Hoang, Mar 03 '21 at 17:45
@QuangHoang performance-wise converting to a list comprehension didn't make any or very little difference — Hasif Subair, Mar 03 '21 at 18:19
On contrary, `append` repeatedly requires reallocation/copy of data when list out grow some threshold. They are often comparable for relatively small data. That said, the main point is `apply` wouldn't give you any advantage, at least compared to the switch to list comprehension. — Quang Hoang, Mar 03 '21 at 18:22
See more details on why `apply` is not an improvement [here](https://stackoverflow.com/questions/54432583/when-should-i-not-want-to-use-pandas-apply-in-my-code). — Quang Hoang, Mar 03 '21 at 18:24

Efficiently iterating through multiple series in a pandas dataframe

0 Answers0