0

I have a really large dataframe similar to this:

     CustomerId   Latitude   Longitude     
0.        a        x1         y1
1.        a        x2         y2
2.        b        x3         y3
3.        c        x4         y4

And I have a second dataframe that corresponds to a sample of the first one, like this:

     CustomerId   Latitude   Longitude     
0.        a         x1         y1
3.        c         x4         y4

My goal is to get a new dataframe just like the original, but with NaN's instead of the coordinates of the rows with indexes that don't exist on the second dataframe. This is the result I would need:

     CustomerId   Latitude   Longitude     
0.        a        x1         y1
1.        a        NaN        NaN
2.        b        NaN        NaN
3.        c        x4         y4

I am new to Python and I haven't found any question like this one. Anybody has an idea of how to solve it?

Nocas
  • 357
  • 1
  • 4
  • 14

1 Answers1

1

First we create a mask with pandas.DataFrame.isin

After that we use np.where and ask for the opposite with ~

mask = df.CustomerId.isin(df2.CustomerId)

df['Latitude']  = np.where(~mask, np.NaN, df['Latitude'])
df['Longitude'] = np.where(~mask, np.NaN, df['Longitude'])

print(df)
    CustomerId Latitude Longitude
0.0          a       x1        y1
1.0          a       x2        y2
2.0          b      NaN       NaN
3.0          c       x4        y4

Explanation:
np.where works as following: np.where(condition, value if true, value if false)

Community
  • 1
  • 1
Erfan
  • 40,971
  • 8
  • 66
  • 78