Replace some values in a dataframe with NaN's if the index of the row does not exist in another dataframe

Question

I have a really large dataframe similar to this:

     CustomerId   Latitude   Longitude     
0.        a        x1         y1
1.        a        x2         y2
2.        b        x3         y3
3.        c        x4         y4

And I have a second dataframe that corresponds to a sample of the first one, like this:

     CustomerId   Latitude   Longitude     
0.        a         x1         y1
3.        c         x4         y4

My goal is to get a new dataframe just like the original, but with NaN's instead of the coordinates of the rows with indexes that don't exist on the second dataframe. This is the result I would need:

     CustomerId   Latitude   Longitude     
0.        a        x1         y1
1.        a        NaN        NaN
2.        b        NaN        NaN
3.        c        x4         y4

I am new to Python and I haven't found any question like this one. Anybody has an idea of how to solve it?

It's the first column, I edited the question because it wasn't clear. Thanks for pointing it out! — Nocas, Mar 17 '19 at 23:50
Thanks, your expected output is not correct now, could you edit that aswell? See my answer — Erfan, Mar 17 '19 at 23:56

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

First we create a mask with pandas.DataFrame.isin

After that we use np.where and ask for the opposite with ~

mask = df.CustomerId.isin(df2.CustomerId)

df['Latitude']  = np.where(~mask, np.NaN, df['Latitude'])
df['Longitude'] = np.where(~mask, np.NaN, df['Longitude'])

print(df)
    CustomerId Latitude Longitude
0.0          a       x1        y1
1.0          a       x2        y2
2.0          b      NaN       NaN
3.0          c       x4        y4

Explanation:
np.where works as following: np.where(condition, value if true, value if false)

Replace some values in a dataframe with NaN's if the index of the row does not exist in another dataframe

1 Answers1

Linked