1

I have the dataframe df1 as

  Userid        timestamp      pnl_of_fills variation                                         
    111       1.643955e+09         20.00       bad
    122       1.643955e+09          2.67      good
    221       1.643955e+09          3.33      good
    2735      1.643955e+09          6.60      good
    31124243  1.643955e+09         40.00       bad

which was created with groupby Userid in pandas dataframe.

and df2 as

   Userid
  
0    111
1    122
2    221
3    2735
4    111

I need new dataframe with combine df1 and df2 as df3

df3 like

   Userid  pnl_of_fills  Variation 
  
0    111    20.00         bad
1    122    2.67         good
2    221    3.33         good
3    2735    6.60        good 
4.   111    20.00         bad  

Have tried multiple approaches and didnt find anything useful.

Basically for every value of df2 column, the df1 values needs to be updated.

Madan
  • 226
  • 2
  • 8

1 Answers1

0

Suppose your 2 dataframes:

>>> df1
             timestamp  pnl_of_fills variation
Userid                                        
111       1.643955e+09         20.00       bad
122       1.643955e+09          2.67      good
221       1.643955e+09          3.33      good
2735      1.643955e+09          6.60      good
31124243  1.643955e+09         40.00       bad

>>> df2
   Userid  Dummy
0     111     11
1     122     21
2     221     31
3    2735     41
4     111     51

You want to merge them on Userid: the problem here is for the first dataframe, Userid is the index rather than for the second Userid is a column so you need some extra options to use with merge:

# Merge with index of df1 and column on df2
>>> df1.merge(df2, left_index=True, right_on='Userid')
      timestamp  pnl_of_fills variation  Userid  Dummy
0  1.643955e+09         20.00       bad     111     11
4  1.643955e+09         20.00       bad     111     51
1  1.643955e+09          2.67      good     122     21
2  1.643955e+09          3.33      good     221     31
3  1.643955e+09          6.60      good    2735     41

# Merge both on columns because you reset index of df1
>>> df1.reset_index().merge(df2, on='Userid') 
      timestamp  pnl_of_fills variation  Userid  Dummy
0  1.643955e+09         20.00       bad     111     11
4  1.643955e+09         20.00       bad     111     51
1  1.643955e+09          2.67      good     122     21
2  1.643955e+09          3.33      good     221     31
3  1.643955e+09          6.60      good    2735     41

# Light merge with join based on index of both dataframes
>>> df1.join(df2.set_index('Userid'))
             timestamp  pnl_of_fills variation  Dummy
Userid                                               
111       1.643955e+09         20.00       bad   11.0
111       1.643955e+09         20.00       bad   51.0
122       1.643955e+09          2.67      good   21.0
221       1.643955e+09          3.33      good   31.0
2735      1.643955e+09          6.60      good   41.0
31124243  1.643955e+09         40.00       bad    NaN

You can get more information about merge: Pandas Merging 101

Corralien
  • 109,409
  • 8
  • 28
  • 52