Merging pandas dataframe on unique values in a column

Question

I have a df1 as:

There are a lot of duplicating values for SUBJECT_ID as shown in the picture. I have a df2 to merge from, but I want to merge it on unique SUBJECT_ID. For now I only know how to merge to entire SUBJECT_ID through this code:

df1 = pd.merge(df1,df2[['SUBJECT_ID', 'VALUE']], on='SUBJECT_ID', how='left' )

But this will merge on every SUBJECT_ID. I just need unique SUBJECT_ID. Please help me with this.

Add code, errors, and data as text, not screenshots because [Stack Overflow Discourages Screenshots](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors). It is likely the question will be down-voted. You are discouraging assistance because no one wants to retype your data or code, and screenshots are often illegible. [edit] the question and add text. — Trenton McKinney, Jul 16 '20 at 03:15
Please [create a reproducible copy of the DataFrame with `df.head(10).to_clipboard(sep=',')`](https://stackoverflow.com/questions/52413246/how-to-provide-a-copy-of-your-dataframe-with-to-clipboard), [edit] the question, and paste the clipboard into a code block or include synthetic data: [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — Trenton McKinney, Jul 16 '20 at 03:16
If there are so many duplicates, `.drop_duplicates` or `groupby`. — Trenton McKinney, Jul 16 '20 at 03:18
In copy and pasteable format what is df1 and df1 (include that in oyur question). Then, what is your desired output?: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — David Erickson, Jul 16 '20 at 04:12
Like everyone is saying, create a short dummy dataframe using code so we can just copy/paste it instead of having to retype it. Example: `SUBJECT_ID = [31, 31, 31]` ... `GENDER = ['M', 'M', 'M']` ... etc. ... `df1 = pd.DataFrame(list(zip(SUBJECT_ID, GENDER)), columns =['SUBJECT_ID', 'GENDER'])` — a11, Jul 16 '20 at 04:23

score 2 · Answer 1 · answered Jul 16 '20 at 04:20

I think you will find your answer with the merge documentation.

It's not fully clear what you want, but here are some examples that may contain the answer you are looking for:

import pandas as pd
df1 = pd.read_csv('temp.csv')
display(df1)

SUBJECT_ID = [31, 32, 33]
something_interesting = ['cat', 'dog', 'fish']
df2 = pd.DataFrame(list(zip(SUBJECT_ID, something_interesting)), 
                   columns =['SUBJECT_ID', 'something_interesting']) 
display(df2)

df_keep_all = df1.merge(df2, on='SUBJECT_ID', how='outer')
display(df_keep_all)

df_keep_df1 = df1.merge(df2, on='SUBJECT_ID', how='inner')
display(df_keep_df1)

df_thinned = pd.merge(df1.drop_duplicates(), df2, on='SUBJECT_ID', how='inner')
display(df_thinned)

score 1 · Answer 2 · edited Nov 03 '21 at 10:51

1

You can use pandas drop function for it using this function you can remove all duplicate values for column or columns.

df2 = df.drop_duplicates(subset=['SUBJECT_ID'])`

edited Nov 03 '21 at 10:51

tlentali

3,407
2
14
21

answered Jul 16 '20 at 06:15

Jeel Gondaliya

117
9

Merging pandas dataframe on unique values in a column

2 Answers2