1

I'm looking for a solution similar to this one but for multiple columns. So I have this table1:

date   client_id   product_id  date2   col2   col3
1/1/15 1           A           1/1/20  50     AAA
1/2/15 2           B           1/3/30  40     BBB
1/3/15 2           B           NaN     NaN    NaN

where I need to merge it with table2:

date   client_id   product_id  date3   col2  col3
1/1/15 1           A           1/1/20  1000   XXX
1/2/15 2           B           NaN     NaN    NaN
1/3/15 2           B           2/3/27  3000   ZZZ

Resulting with a table1 like this (output expected):

date   client_id   product_id  date2   col2   col3
1/1/15 1           A           1/1/20   50     AAA
1/2/15 2           B           1/3/30   40     BBB
1/3/15 2           B           2/3/27   3000   ZZZ

The number of rows for table1 and table2 are the same, as the key columns (date, client_id, product_id) for both tables are the same. However, their main difference is on date2, date3. If date2 is null, all the col columns will be null (from col2 to col500), Same for date3. However, I need to replace where date2 is null with values of date3 given the key columns.

The solution in the link at the beginning creates an auxiliar column for each column you need to replace, but it's impractical for a dataset of 500 columns.

Any suggestions?

Chris
  • 2,019
  • 5
  • 22
  • 67

1 Answers1

0

One quick way is to use concat and groupby:

(pd.concat([table1, table2.rename(columns={'date3':'date2'})])
   .groupby(['date','client_id'], as_index=False)
   .first()
)

Output:

     date  client_id product_id   date2    col2 col3
0  1/1/15          1          A  1/1/20    50.0  AAA
1  1/2/15          2          B  1/3/30    40.0  BBB
2  1/3/15          2          B  2/3/27  3000.0  ZZZ
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • Can you explain a little bit? It keep the first of what? Should I include product_id also in group_by because it's one of the keys to merge? – Chris Nov 26 '20 at 13:51
  • Yes, I guess I didn’t read the question thoroughly. The idea is groupby can act like merge while first will ignore NaN and try to find the first valid values. – Quang Hoang Nov 26 '20 at 13:59