what's the fastest way to apply a function over rows of two big dataframes in pandas?

Question

I have two datasets, the first has 60k rows and looks like this:

       type  power  bonus
 0   Eletric     10      3
 1    Flying      5      5
..      ...    ...    ...
2     Grass     10      5

[61000 rows x 3 columns]

and the second one has half a million of rows and looks like that:

      pokemon     type  attack
 0    Pikachu  Eletric     105
 1  Bulbasaur    Grass      90
..        ...      ...     ...
 2     Treeko    Grass     105
 3  Dragonite   Flying     125

[650000 rows x 3 columns]

I want to apply this function on the joint table of the two datasets (type == type)

points = attack * power + bonus

so at the end I want to obtain a Series that looks like this:

pokemon
Pikachu      1053
Bulbasaur     905
              ...
Treeko       1055
Dragonite     630
Name: points, Length: 650000

I've already managed to write a solution using pd.apply function, but it takes too long imo. What's the fastest way to manage well the computational complexity? Should I quit pandas and work with native python data structures?

You can try something likes: `df2.merge(df1, on="type", how="left").eval("attack * power + bonus")` — phi, Jun 24 '22 at 10:25

what's the fastest way to apply a function over rows of two big dataframes in pandas?

0 Answers0