0

I have two DataFrames that look like this:

df1 (pretty small):

index sales
1 10
2 20

and df2 (very large >5Mil):

idx1 idx2
1 2

and I want the final to look like this:

idx1 idx2 totalSales
1 2 30

I currently have this working but it is very slow:

df2['totalSales'] = df2.apply(lambda x: df1.loc[x]['sales'].sum(), axis=1)

Are there any faster/better ways to go about this? This works for me just fine, but it takes a very long time to run. Thanks in advance!

InnKeeper
  • 27
  • 4

1 Answers1

1

This should be faster than apply:

df2['totalSales'] = df2.idx1.map(df1.sales) + df2.idx2.map(df1.sales)

df2 
#   idx1  idx2  totalSales
#0     1     2          30
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Hi, what is the difference between `apply` and `map` for a series? – Rishabh Kumar Mar 13 '21 at 04:53
  • @RishabhKumar Here is a good explanation: https://stackoverflow.com/a/56300992/4983450. TLDR: `map` is an optimized version that can accept dict / series while `apply` is more flexible but also slower. – Psidom Mar 13 '21 at 04:59