Do a merge() on Date
and x
column:
import pandas as pd
df1 = pd.DataFrame({'Date': ['2-Jul', '2-Jul', '3-Jul'],
'x': ['Bob', 'Bob', 'Alice'],
'y': [5, 9, 7]})
df2 = pd.DataFrame({'Date': ['2-Jul', '3-Jul'],
'x': ['Bob', 'Alice'],
'z': [2, 8]})
df3 = pd.merge(df1, df2, on=['Date', 'x'])
# print(df3)
Date x y z
0 2-Jul Bob 5 2
1 2-Jul Bob 9 2
2 3-Jul Alice 7 8
pandas.DataFrame.duplicated() returns boolean Series denoting duplicate rows. keep=first
marks duplicates as True except for the first occurrence. The default value of keep
is first, so you can omit this.
pandas.DataFrame.mask() replaces values where the condition is True.
df3['z'].mask(df3.duplicated(subset=['Date', 'x'], keep='first'), inplace=True)
# print(df3)
Date x y z
0 2-Jul Bob 5 2.0
1 2-Jul Bob 9 NaN
2 3-Jul Alice 7 8.0