0

so I want to merge two dataframes that look like this:

df1

id object position x y z
abc J 3 0 1 0
abc C 13 1 0 0
cde C 2 0 1 0

etc etc (df1 is much smaller than df2)

df2

id object position bla bla
1 abc D 1 bla bla
2 abc S 2 bla bla
3 abc J 3 bla bla
4 abc J 4 bla bla
... ... ... ... ... ...
12 abc X 12 bla bla
13 abc C 13 bla bla
... ... ... ... ... ...
45 cde J 1 bla bla
46 cde C 2 bla bla

etc etc etc

what I want is a code that can merge x y z from df1 into df2, using id, object and position to correctly map x y z, so my output would look like this

id object position bla bla x y z
1 abc D 1 bla bla 0 0 0
2 abc S 2 bla bla 0 0 0
3 abc J 3 bla bla 0 1 0
4 abc J 4 bla bla 0 0 0
... ... ... ... ... ... ... ...
12 abc X 12 bla bla 0 0 0
13 abc C 13 bla bla 1 0 0
... ... ... ... ... ... ... ...
45 cde J 1 bla bla 0 0 0
46 cde C 2 bla bla 0 1 0

so the stuff in bold are lines that are found in df1 where x, y, and z can be copied.

no matter what I try x y z in the merged tables ends up being full of 0s.

tried:

merged_df = df2.merge(df1, left_on=['id', 'Column A', 'position'], right_on=['id', 'object', 'position'], how='left')

merged_df = merged_df.drop(columns=['object'])
merged_df = merged_df.rename(columns={'x_x': 'x', 'y_x': 'y', 'z_x': 'z'})

merged_df = merged_df.fillna(0)

Everything ends up being 0

also tried:

merged_df = df1.merge(df2, on=['id', 'object', 'position'], how='left')

test_set_df['index'] = merged_df['index']

max_index_value = df2.index.max()
for index_value in range(1, max_index_value + 1):
    if index_value not in df1['index'].values:
        new_row = pd.DataFrame({'id': [''], 'object': [''], 'position': [''], 'x': [0], 'y': [0], 'z': [0], 'index': [index_value]})
        df1 = pd.concat([df1, new_row], ignore_index=True)

df1.sort_values(by='index', inplace=True)

df1.reset_index(drop=True, inplace=True)

df2.rename(columns={'x': 'x_alpha', 'y': 'y_alpha', 'z': 'z_alpha'}, inplace=True)

df2_annotated = df2

df2_annotated = df2_annotated.merge(test_set_df[['index', 'x', 'y', 'z']], on='index', how='left')
margarine
  • 11
  • 2
  • 1
    Which exact command did you use? Also provide a minimal reproducible example of your inputs, without `...` and the exact matching expected output. – mozway Jul 22 '23 at 20:21
  • Welcome to Stack Overflow! Please take the [tour] and read [ask]. For help with your code, you need to make a [mre], cause how are we supposed to know why your code isn't working if you haven't shown it? Please also make sure to provide complete example input and complete desired output, cause your input `df1` doesn't contain `cde, C, 2`. For specifics, see [How to make good reproducible pandas examples](/q/20109391/4518341). FWIW, there's an existing broad question: [Pandas Merging 101](/q/53645882/4518341) – wjandrea Jul 22 '23 at 20:24
  • I see your edits, but that's still not an MRE. To start, `df2` doesn't contain `Column A` anymore, columns ending in `_x` don't exist, and `df1` still doesn't contain `cde, C, 2`. – wjandrea Jul 22 '23 at 23:17

1 Answers1

0

merge both dataframes based on id and position columns

df_final = pd.merge(df2, df1, on=['id','position'], how='left')
df_final[['x', 'y', 'z']] = df_final[['x', 'y', 'z']].fillna(0)
print(df_final)
yashaswi k
  • 668
  • 7
  • 17
  • What about `"object"`? OP said that should be part of the merge too, though their output seems to say different – wjandrea Jul 22 '23 at 20:44
  • Hmm, the result gets converted to float. Do you know if there's a way to keep it as int? `marge` doesn't have a `fill_value` parameter so that doesn't work... – wjandrea Jul 22 '23 at 21:03
  • @wjandrea thanks for pointing it out , as output dataframe didn't contain object column haven't considered it while merging – yashaswi k Jul 22 '23 at 21:08
  • `df_final[['x', 'y', 'z']] = df_final[['x', 'y', 'z']].fillna(0).astype(int)` should do the job – yashaswi k Jul 22 '23 at 21:09
  • It seems like Column A corresponds to `"object"` (but again, the output doesn't match that) – wjandrea Jul 22 '23 at 21:09
  • Oh, I meant to not convert to float in the first place. NBD in this case but I'm just thinking it could cause problems in other cases. Anyway, I found a way to do it: `df1.set_index(cols_right).reindex(df2[cols_left], fill_value=0).pipe(lambda d1: df2.join(d1, on=cols_left, how='left'))` – wjandrea Jul 22 '23 at 21:14
  • yeah, then either of the column should be renamed for merging – yashaswi k Jul 22 '23 at 21:15
  • 1
    Oh, you don't need to rename them. You can just specify `left_on=` and `right_on=`, like `pd.merge(df2, df1, left_on=cols_left, right_on=cols_right, how='left')`, where `cols_common = ['id', 'position']; cols_left = cols_common + ['Column A']; cols_right = cols_common + ['object']` – wjandrea Jul 22 '23 at 21:16
  • Hiya, I tried doing this and x, y, z just merged left of the first few lines of df2 regardless of id, object and position and the rest of the table got filled with NaN, any reason which could be causing this? – margarine Jul 22 '23 at 22:20