1

I have two data frames df and df_copy. I would like to copy the data from df_copy, but only if the data is also identical. How do I do that?

import pandas as pd

d = {'Nameid': [100, 200, 300, 100]
     , 'Name': ['Max', 'Michael', 'Susan', 'Max']
     , 'Projectid': [100, 200, 200, 100]}

df = pd.DataFrame(data=d)
display(df.head(5))

df['nameid_index'] = df['Nameid'].astype('category').cat.codes
df['projectid_index'] = df['Projectid'].astype('category').cat.codes
display(df.head(5))

df_copy = df.copy()

df.drop(['Nameid', 'Name', 'Projectid'], axis=1, inplace=True)
df = df.drop([1, 3])
display(df.head(5))

df

enter image description here

df_copy

enter image description here

What I want

enter image description here


I looked at Pandas Merging 101

df.merge(df_copy, on=['nameid_index', 'projectid_index'])

But I got this result

enter image description here

The same row are twice, I only want once.

  • 1
    So need `df.drop_duplicates(['nameid_index', 'projectid_index']).merge(df_copy, on=['nameid_index', 'projectid_index'])` ? – jezrael Nov 05 '20 at 12:11
  • @jezrael thank you very much! Didn't know that I combine this. Thank you. –  Nov 05 '20 at 12:12
  • Does this answer your question? [pandas: merge (join) two data frames on multiple columns](https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns) – Rishabh Deep Singh Nov 05 '20 at 12:16

1 Answers1

2

Use DataFrame.drop_duplicates first:

df1 = (df.drop_duplicates(['nameid_index', 'projectid_index'])
         .merge(df_copy, on=['nameid_index', 'projectid_index']))

If need merge by intersection of columns names in both DataFrames, on parameter should be removed:

df1 = df.drop_duplicates(['nameid_index', 'projectid_index']).merge(df_copy)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252