merging two dataframes of different sizes

Question

so I want to merge two dataframes that look like this:

df1

id	object	position	x	y
abc	J	3	0	1
abc	C	13	1	0
cde	C	2	0	1

etc etc (df1 is much smaller than df2)

df2

	id	object	position	bla	bla
1	abc	D	1	bla	bla
2	abc	S	2	bla	bla
3	abc	J	3	bla	bla
4	abc	J	4	bla	bla
...	...	...	...	...	...
12	abc	X	12	bla	bla
13	abc	C	13	bla	bla
...	...	...	...	...	...
45	cde	J	1	bla	bla
46	cde	C	2	bla	bla

etc etc etc

what I want is a code that can merge x y z from df1 into df2, using id, object and position to correctly map x y z, so my output would look like this

	id	object	position	bla	bla	x	y	z
1	abc	D	1	bla	bla	0	0	0
2	abc	S	2	bla	bla	0	0	0
3	abc	J	3	bla	bla	0	1	0
4	abc	J	4	bla	bla	0	0	0
...	...	...	...	...	...	...	...
12	abc	X	12	bla	bla	0	0	0
13	abc	C	13	bla	bla	1	0	0
...	...	...	...	...	...	...	...
45	cde	J	1	bla	bla	0	0	0
46	cde	C	2	bla	bla	0	1	0

so the stuff in bold are lines that are found in df1 where x, y, and z can be copied.

no matter what I try x y z in the merged tables ends up being full of 0s.

tried:

merged_df = df2.merge(df1, left_on=['id', 'Column A', 'position'], right_on=['id', 'object', 'position'], how='left')

merged_df = merged_df.drop(columns=['object'])
merged_df = merged_df.rename(columns={'x_x': 'x', 'y_x': 'y', 'z_x': 'z'})

merged_df = merged_df.fillna(0)

Everything ends up being 0

also tried:

merged_df = df1.merge(df2, on=['id', 'object', 'position'], how='left')

test_set_df['index'] = merged_df['index']

max_index_value = df2.index.max()
for index_value in range(1, max_index_value + 1):
    if index_value not in df1['index'].values:
        new_row = pd.DataFrame({'id': [''], 'object': [''], 'position': [''], 'x': [0], 'y': [0], 'z': [0], 'index': [index_value]})
        df1 = pd.concat([df1, new_row], ignore_index=True)

df1.sort_values(by='index', inplace=True)

df1.reset_index(drop=True, inplace=True)

df2.rename(columns={'x': 'x_alpha', 'y': 'y_alpha', 'z': 'z_alpha'}, inplace=True)

df2_annotated = df2

df2_annotated = df2_annotated.merge(test_set_df[['index', 'x', 'y', 'z']], on='index', how='left')

Which exact command did you use? Also provide a minimal reproducible example of your inputs, without `...` and the exact matching expected output. — mozway, Jul 22 '23 at 20:21
Welcome to Stack Overflow! Please take the [tour] and read [ask]. For help with your code, you need to make a [mre], cause how are we supposed to know why your code isn't working if you haven't shown it? Please also make sure to provide complete example input and complete desired output, cause your input `df1` doesn't contain `cde, C, 2`. For specifics, see [How to make good reproducible pandas examples](/q/20109391/4518341). FWIW, there's an existing broad question: [Pandas Merging 101](/q/53645882/4518341) — wjandrea, Jul 22 '23 at 20:24
I see your edits, but that's still not an MRE. To start, `df2` doesn't contain `Column A` anymore, columns ending in `_x` don't exist, and `df1` still doesn't contain `cde, C, 2`. — wjandrea, Jul 22 '23 at 23:17

score 0 · Answer 1 · answered Jul 22 '23 at 20:27

0

merge both dataframes based on id and position columns

df_final = pd.merge(df2, df1, on=['id','position'], how='left')
df_final[['x', 'y', 'z']] = df_final[['x', 'y', 'z']].fillna(0)
print(df_final)

answered Jul 22 '23 at 20:27

yashaswi k

668
7
17

What about `"object"`? OP said that should be part of the merge too, though their output seems to say different – wjandrea Jul 22 '23 at 20:44
Hmm, the result gets converted to float. Do you know if there's a way to keep it as int? `marge` doesn't have a `fill_value` parameter so that doesn't work... – wjandrea Jul 22 '23 at 21:03
@wjandrea thanks for pointing it out , as output dataframe didn't contain object column haven't considered it while merging – yashaswi k Jul 22 '23 at 21:08
`df_final[['x', 'y', 'z']] = df_final[['x', 'y', 'z']].fillna(0).astype(int)` should do the job – yashaswi k Jul 22 '23 at 21:09
It seems like Column A corresponds to `"object"` (but again, the output doesn't match that) – wjandrea Jul 22 '23 at 21:09
Oh, I meant to not convert to float in the first place. NBD in this case but I'm just thinking it could cause problems in other cases. Anyway, I found a way to do it: `df1.set_index(cols_right).reindex(df2[cols_left], fill_value=0).pipe(lambda d1: df2.join(d1, on=cols_left, how='left'))` – wjandrea Jul 22 '23 at 21:14
yeah, then either of the column should be renamed for merging – yashaswi k Jul 22 '23 at 21:15
1

Oh, you don't need to rename them. You can just specify `left_on=` and `right_on=`, like `pd.merge(df2, df1, left_on=cols_left, right_on=cols_right, how='left')`, where `cols_common = ['id', 'position']; cols_left = cols_common + ['Column A']; cols_right = cols_common + ['object']` – wjandrea Jul 22 '23 at 21:16
Hiya, I tried doing this and x, y, z just merged left of the first few lines of df2 regardless of id, object and position and the rest of the table got filled with NaN, any reason which could be causing this? – margarine Jul 22 '23 at 22:20

	id	object	position	bla	bla
1	abc	D	1	bla	bla
2	abc	S	2	bla	bla
3	abc	J	3	bla	bla
4	abc	J	4	bla	bla
...	...	...	...	...	...
12	abc	X	12	bla	bla
13	abc	C	13	bla	bla
...	...	...	...	...	...
45	cde	J	1	bla	bla
46	cde	C	2	bla	bla

	id	object	position	bla	bla
1	abc	D	1	bla	bla
2	abc	S	2	bla	bla
3	abc	J	3	bla	bla
4	abc	J	4	bla	bla
...	...	...	...	...	...
12	abc	X	12	bla	bla
13	abc	C	13	bla	bla
...	...	...	...	...	...
45	cde	J	1	bla	bla
46	cde	C	2	bla	bla

merging two dataframes of different sizes

1 Answers1

	id	object	position	bla	bla
1	abc	D	1	bla	bla
2	abc	S	2	bla	bla
3	abc	J	3	bla	bla
4	abc	J	4	bla	bla
...	...	...	...	...	...
12	abc	X	12	bla	bla
13	abc	C	13	bla	bla
...	...	...	...	...	...
45	cde	J	1	bla	bla
46	cde	C	2	bla	bla