0

I need some guidance on python pandas as it is an unknown territory for a frontend dev. I am familiar with the dataframes concept now. I was looking to find way to create a new dataframe by comparing two other dataframes. For this purpose, what should I be looking for in pandas?

For Example, consider df1 as

 Date            col1     col2     col3     id
 2017-04-14      2482        1        0     a2
 2017-04-15      2483        1        0     a3

and df2 as

 Date            col1     col2     col3     id
 2017-04-15      2483       10       20     a3
 2017-04-14      2482       11        0     a2

so what I am trying to achieve is create a new dataframe with details of values that are different like

 Date            df1_value    df2_valuue    diff_col_name    val_diff     id
 2017-04-14      1            11            col2             -10          a2
 2017-04-15      1            11            col2              -9          a3
 2017-04-15      0            20            col3              20          a3

so I was able to join the two dfs based on id, df1.merge(df2, on='id', how='left') , but what should be the next move. How do I compare the differences and create the final df?

Sam
  • 649
  • 1
  • 6
  • 17
  • 1
    Possible duplicate of [Outputting difference in two Pandas dataframes side by side - highlighting the difference](http://stackoverflow.com/questions/17095101/outputting-difference-in-two-pandas-dataframes-side-by-side-highlighting-the-d) – philshem May 16 '17 at 07:14

1 Answers1

0

Setup

df1 = pd.DataFrame({'Date': {0: '2017-04-14', 1: '2017-04-15'},
 'col1': {0: 2482, 1: 2483},
 'col2': {0: 1, 1: 1},
 'col3': {0: 0, 1: 0},
 'id': {0: 'a2', 1: 'a3'}})

df2 = pd.DataFrame({'Date': {0: '2017-04-15', 1: '2017-04-14'},
 'col1': {0: 2483, 1: 2482},
 'col2': {0: 10, 1: 11},
 'col3': {0: 20, 1: 0},
 'id': {0: 'a3', 1: 'a2'}})

Solution

#melt the dfs to long df from wide df and merge them together.
dfm = pd.merge(pd.melt(df1,id_vars=['Date','id']),
               pd.melt(df2,id_vars=['Date','id']),
               how='outer',on=['Date','id','variable'])

#rename columns
dfm.columns=['Date','id','diff_col_name','df1_value','df2_value']
#compare values
dfm['val_diff'] = dfm.df1_value-dfm.df2_value
#reorder columns
dfm = dfm[['Date','df1_value','df2_value','diff_col_name','val_diff','id']]
#filter unequal values
dfm=dfm[dfm.val_diff!=0]

Out[2001]: 
         Date  df1_value  df2_value diff_col_name  val_diff  id
2  2017-04-14          1         11          col2       -10  a2
3  2017-04-15          1         10          col2        -9  a3
5  2017-04-15          0         20          col3       -20  a3
Allen Qin
  • 19,507
  • 8
  • 51
  • 67