0

I want to merge the following df with the series_object below it (details below them):

df1

       wallet       cumulative_rewards  position      position_rewards  position_type
0      0x12345                       0      LUSD      1000              LP Pair
0      0x12345                       0      TOKE      200               LP Token
1      0xabcde                       0    UNI_LP      500               LP
1      0xabcde                       0  SUSHI_LP      0                 LP
1      0xabcde                       0       DAI      100               LP Pair

series_object: grouped_toke_transfers = toke_transfers_df.groupby('from_address')['value'].sum(). Where the index is from_addressand the values of the series_object are the values.

from_address   value
0x12345        13687137402763990447827
0xabcde        58950104860666120622
0xfghij        3491287228881431880579
0xklmno        1260986666816869789

Merged df details:

  1. Keep the wallets from df1(repeated indexes/wallets);
  2. Add value from df2to each corresponding wallet (same value per wallet should appear on each row for that wallet);
  3. Merged dfshould not include wallets from df1 where that wallet does not exist in df2.

Bonus: What I want to do, ultimately, is query the merged table for value (from df2) based on the position_type(from df1). how would I go about doing this grouopby()?

Luiz Scheuer
  • 305
  • 1
  • 10

1 Answers1

1

Use:

print (df1)
    wallet  cumulative_rewards  position  position_rewards position_type
0  0x12345                   0      LUSD              1000       LP Pair
0  0x12345                   0      TOKE               200      LP Token
1  0xabcde                   0    UNI_LP               500            LP
1  0xabcde                   0  SUSHI_LP                 0            LP
1      eee                   0       DAI               100       LP Pair

Map solution - create new column value with set missing values by Series.where:

grouped_toke_transfers = toke_transfers_df.groupby('from_address')['value'].sum()
df1['value'] = df1['wallet'].map(grouped_toke_transfers)
df1['wallet'] = df1['wallet'].mask(df1['value'].isna())

print (df1)
    wallet  cumulative_rewards  position  position_rewards position_type  \
0  0x12345                   0      LUSD              1000       LP Pair   
0  0x12345                   0      TOKE               200      LP Token   
1  0xabcde                   0    UNI_LP               500            LP   
1  0xabcde                   0  SUSHI_LP                 0            LP   
1      NaN                   0       DAI               100       LP Pair   

                     value  
0  13687137402763990447827  
0  13687137402763990447827  
1     58950104860666120622  
1     58950104860666120622  
1                      NaN 
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Actually, I apologize, I had a mistake in terms of the data tyoe of `df2`. I updated the question to show the actual data type now. It is a `Series`object where the `index` is `from_address`, so `from_address` is not a column. That's why your solution wasn't working for me. – Luiz Scheuer Sep 08 '22 at 11:00
  • 1
    @LuizScheuer - answer was edited. – jezrael Sep 08 '22 at 11:01
  • Checking now, thanks! – Luiz Scheuer Sep 08 '22 at 11:03
  • What it does now is it is giving me `NaNs` for both the `wallet` and `value` columns (although the index is correct). I'm also checking here if there is anything wrong from my side. – Luiz Scheuer Sep 08 '22 at 11:06