0

I would like two merge these two dataframes:

df1
    a   b   c                              
4   0   1   3
5   0   1   3
6   0   1   3
2   1   2   2
3   1   2   2
15  2   1   3
12  2   1   3
13  2   1   3
15  3   1   4
14  3   1   4
8   3   1   4
9   3   1   4
df2

b  c  e
1  3  2
1  3  0
1  3  1

df1 should be grouped by a and merged with df2 if they are equal within one group of a. after merging/joining it should look like that:

    a   b   c   e                              
4   0   1   3   2
5   0   1   3   0
6   0   1   3   1
2   1   2   2   nan
3   1   2   2   nan
15  2   1   3   2
12  2   1   3   0
13  2   1   3   1
15  3   1   4   nan
14  3   1   4   nan
8   3   1   4   nan
9   3   1   4   nan

Can anyone help me?

MirekG
  • 161
  • 6
  • what is the group key for df2 – sammywemmy Apr 27 '20 at 09:45
  • I edited the post. I always want to put df2 into df1 if b and c are equal in both df's. – MirekG Apr 27 '20 at 09:53
  • In the example, matching groups have exactly 3 rows. Is this always guaranteed? – Serge Ballesta Apr 27 '20 at 10:19
  • no, only if df1 has 3 rows in one group it should be merged with df2. There will be other df's just like df2 that should merge groups of 2 rows,4 rows and so on. It could happen that there is for example two exact same groups of 3 underneath each other but they would differ in a. – MirekG Apr 27 '20 at 11:01
  • Sorry. Yes in this case it's exactly 3. – MirekG Apr 27 '20 at 11:27

2 Answers2

0

You do not really want a merge, because you want to copy the values for 3 rows when the values of 2 columns are common for those 3 rows.

I would use a dedicated function to control that a group does contain exactly as much as rows as df2 and that the values are the same:

def tst(x):
    if len(x) != 3: return False
    x = x.reset_index()[['b', 'c']]
    return (x == df2[['b', 'c']]).all().all()

We can then write:

new_col = df1.groupby('a').apply(lambda x: df2['e'] if tst(x) else pd.Series(
    pd.np.nan, index = x.index))
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
-1
new_df = pd.merge(df1, df2,  how='left', left_on=['b','c'], right_on = ['b','c'])

This should work in your case. Doc link