merge two dataframe that share the same column value

Question

I have two dataframe df1 =

C	L	O	D	LN	M	x	y	z
FR	312	73	2021-04-09	1	W	1	0	0
FR	312	73	2021-04-09	1	W	0	1	0
FR	312	73	2021-04-09	1	W	0	0	1
FR	312	73	2021-04-09	3	W	1	0	0
FR	312	73	2021-04-09	3	W	0	1	0
FR	312	73	2021-04-09	3	W	0	0	1
FR	312	73	2021-04-09	4	W	1	0	0
FR	312	73	2021-04-09	4	W	0	1	0
FR	312	73	2021-04-09	4	W	0	0	1
FR	312	73	2021-04-09	5	W	1	0	0
FR	312	73	2021-04-09	5	W	0	1	0

df2 =

C	L	O	D	LN	M	x	y	z	w
FR	312	73	2021-04-09	1	L	1	0	0	1

I would like to concatenate or merge those 2 df on C | L | O | D | LN

output desired =

C	L	O	D	LN	M	x	y	z	w
FR	312	73	2021-04-09	1	W	1	0	0	0
FR	312	73	2021-04-09	1	W	0	1	0	0
FR	312	73	2021-04-09	1	W	0	0	1	0
FR	312	73	2021-04-09	1	L	0	0	0	1

thanks for your help

Does this answer your question? [how to merge two data frames based on particular column in pandas python?](https://stackoverflow.com/questions/37697195/how-to-merge-two-data-frames-based-on-particular-column-in-pandas-python) — Mahrkeenerh, Nov 03 '21 at 11:53
not really. I have the exact same columns on both df. the output should be the same column name — John Mack, Nov 03 '21 at 11:56
I was going off of the title, since the post was badly formatted. Now it seems like you want to filter the first dataframe based on the second one? Doesn't really seem like merging at all — Mahrkeenerh, Nov 03 '21 at 12:01
it is not an inner merge ? df2 is just a row, but in reality its a more than one — John Mack, Nov 03 '21 at 12:03
I guess it does, try: https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns — Mahrkeenerh, Nov 03 '21 at 12:06
I already tried this but it doesnt give what I want. I dont want to merge on x,y,z,w and I dont want to have an output with x_l, x_r, y_l, y_l, z_r, z_l, w_l, w_r — John Mack, Nov 03 '21 at 12:09
why ? I need them in my output. if it is possible how do you do that and merge it back ? — John Mack, Nov 03 '21 at 12:28
@Mahrkeenerh please can you elaborate your idea with an example ? If you don't know the right answer it is completely fine. thanks for your help — John Mack, Nov 03 '21 at 12:33
if you need the values from both dataframes, it's not a merge. Merge adds extra data to existing rows, doesn't create new rows (if the values exist, which they do in your case). So it looks like you want to filter the data, and then append your second df — Mahrkeenerh, Nov 03 '21 at 12:38
@Mahrkeenerh ok but how do you filter then ? Iam not sure you understand the problem. but thanks for your input — John Mack, Nov 03 '21 at 12:44
You forgot to mention your df2 has 20k rows ... from your question, it has just one, so unless you explain something, don't expect anyone to know it. You can still make a duplicate df2 for merging, by dropping the unnecessary columns (M, x, y, z, w). After merge, you might however get some rows, with Nan values, drop them too. And after that, just combine the df2 with your merged df. It now has filtered rows from df1 according to your rows from df2, and also has all df2 rows — Mahrkeenerh, Nov 03 '21 at 13:57
@Mahrkeenerh your answer is not clear. please provide a constructed answer with code if possible. also in my previous comment I mentioned that df2 is not one row. I just made it simple. but filtering is defenetly not the right way. and finaly I don't know why you saying drop unnecessary columns. all the columns are very important and cannot be dropped — John Mack, Nov 03 '21 at 15:41
that's because I didn't provide an answer, but a `how to` instead. I also said you can drop the columns in your new duplicated df2 that will be used for merging, not to drop them from the original one. — Mahrkeenerh, Nov 03 '21 at 15:45
@ but you how to is not helpfull as it is not constructed. you should try your thinking before giving an answer to see if it works first. but what you doing is just writing your thinking whithout being sure if that works. if you want here is some code so you can try: — John Mack, Nov 03 '21 at 15:51
@Mahrkeenerh df = pd.DataFrame( {"O": [73, 73, 73, 73, 73, 73,73, 73, 73,73, 73, 73,73, 73, 73],"D": ['2021-04-09', '2021-04-09', '2021-04-09', '2021-04-09', '2021-04-09', '2021-04-09','2021-04-09', '2021-04-09', '2021-04-09','2021-04-09', '2021-04-09', '2021-04-09','2021-04-09', '2021-04-09', '2021-04-09'],"LN": [1, 1, 1, 5, 5, 5, 5, 5, 5, 5,7,7,7,7,7],"M": ['W','W','W','W','W','W','W','W','W','W','W','W','W','W','W'],"x": — John Mack, Nov 03 '21 at 15:54
[1,0,0,1,0,0,1,0,0,1,0,0,1,0,0],"y": [0,0,1,0,0,1,0,0,1,0,0,1,0,0,1],"z": [0,1,0,0,1,0,0,1,0,0,1,0,0,1,0],"w": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"a": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0], "b": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],"c": [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],},columns=["O","D", "LN", "M","x", "y", "z",'w','a', 'b', 'c']) — John Mack, Nov 03 '21 at 15:54
df_one = pd.DataFrame( { "O": [73, 73, 73, 73], "D": ['2021-04-09', '2021-04-09', '2021-04-09', '2021-04-09'], "LN": [1, 1, 1, 1], "M": ['1','1','1','1'], "x": [0,0,0,0], "y": [0,0,0,0], "z": [0,0,0,0], "w": [0,0,0,1], "a": [0,0,1,0], "b": [0,1,0,0], "c": [1,0,0,0], }, columns=["O","D", "LN", "M","x", "y", "z",'w','a', 'b', 'c'] ) — John Mack, Nov 03 '21 at 15:55

Mahrkeenerh · Answer 1 · 2021-11-03T16:07:54.967

As I was trying to point out in the comments, a possible solution is:

duplicate df2, and drop columns that are not wanted (for merging)
use inner merge on df2_dupli and df1
possibly remove Nan rows, not sure if this step is required
combine df2 and df_merge (concat)
profit

# initialize
df1 = pd.DataFrame(data={'A': [1, 2, 3], 'B': ["abc", "abc", "cde"], 'C': [0, 1, 0]})
df2 = pd.DataFrame(data={'A': [1, 2], 'B': ["abc", "abc"], 'C': [5, 5]})

# duplicate and drop
df2_dupli = df2.copy()
df2_dupli = df2_dupli.drop(columns=['C'])

# inner merge
df_merge = pd.merge(df1, df2_dupli,  how='inner', left_on=['A','B'], right_on = ['A','B'])

# combine
df_out = pd.concat([df_merge, df2])

>>> df1
   A    B  C
0  1  abc  0
1  2  abc  1
2  3  cde  0

>>> df2
   A    B  C
0  1  abc  5
1  2  abc  5

>>> df_out
   A    B  C
0  1  abc  0
1  2  abc  1
0  1  abc  5
1  2  abc  5

df_out now contains all rows from df2, and rows from df1 that have the same values in specified columns as df2.

score 1 · Accepted Answer · answered Nov 03 '21 at 16:24

You could set up the indexes such that they align with the data you want to match:

df1.set_index(['C','L','O','D','LN'], inplace=True) 
df2.set_index(['C','L','O','D','LN'], inplace=True)

then you could merge both dfs into a single df containing all records

df_concat = pd.concat([df1, df2], axis=0)

All thats left is to find the indices that exist in both dataframes, and use it on the concatenated df:

i2keep = set(df1.index.tolist()) and set(df2.index.tolist())
df_concat.loc[i2keep,:].reset_index()

yields

    C    L   O         D  LN  M  x  y  z  w                                                                             
0  FR  312  73  4/9/2021   1  W  1  0  0  0                                                                             
1  FR  312  73  4/9/2021   1  W  0  1  0  0                                                                             
2  FR  312  73  4/9/2021   1  W  0  0  1  0                                                                             
3  FR  312  73  4/9/2021   1  L  1  0  0  1

merge two dataframe that share the same column value

2 Answers2