Pandas check if two columns are the same

Question

df = {'A':[3, 4, 5, np.nan, 6, 7],
        'B':[np.nan, 4, np.nan, np.nan, 6, 7]}

I have a data frame with two columns, A and B. I want to create a new column, C, which is the result of checking whether whether A and B are the same, if they are then keep it but if one is NaN, then keep the other value. Column A and B are always either a value or NaN. The values in A and B are always the same.

I know how to check whether A and B are the same:

df['C'] = (df['A'] == df['B]).astype('object')

But this gives a boolean answer in column C whether it's true or false. My expected output would be:

A   B   C
3  NaN  3
4   4   4
5  NaN  5
NaN NaN NaN
6   6   6
7   7   7

What is required when columns A and B have different value(s) i.e. not NaN and Not Equal? — Mahendra Singh, Oct 16 '19 at 09:09
Ah, yes - I'll edit the post, but they'll always be NaN or a value, the values will always be the same. — user47467, Oct 16 '19 at 09:11

Joe · Accepted Answer · 2019-10-16T11:56:14.023

You can use np.where()

>>> df = pd.DataFrame({'A':[3, 4, 5, np.nan],'B':[np.nan,4,np.nan,np.nan]})
>>> df
     A    B
0  3.0  NaN
1  4.0  4.0
2  5.0  NaN
3  NaN  NaN

>>> df['C'] = np.where(df['A'].isna(), df['B'], df['A'])
>>> df
     A    B    C
0  3.0  NaN  3.0
1  4.0  4.0  4.0
2  5.0  NaN  5.0
3  NaN  NaN  NaN

Edited Sample

Showing that it would work if df['A'] is nan and df['B'] has value.

>>> df = pd.DataFrame({'A':[3, np.nan, 5, np.nan],'B':[np.nan,4,np.nan,np.nan]})
>>> df
     A    B
0  3.0  NaN
1  NaN  4.0
2  5.0  NaN
3  NaN  NaN

>>> df['C'] = np.where(df['A'].isna(), df['B'], df['A'])
>>> df
     A    B    C
0  3.0  NaN  3.0
1  NaN  4.0  4.0
2  5.0  NaN  5.0
3  NaN  NaN  NaN

Thanks :D

Tag it using `isna()`. Updating answer. – Joe Oct 16 '19 at 12:03 — Joe, Oct 16 '19 at 12:03

score 3 · Answer 2 · answered Oct 16 '19 at 09:18

Use np.select where you can check multiple conditions.

df = pd.DataFrame({'A':[3, 4, 5, np.nan, 6, np.nan],
    'B':[np.nan, 4, np.nan, np.nan, 6, 7]})

df['c'] = np.select([df['A'].isnull() & df['B'].isnull(), df['A'].isnull()], 
                     [np.nan, df['B']], df['A'])

Output:

    A    B       c
0   3.0  NaN     3.0
1   4.0  4.0     4.0
2   5.0  NaN     5.0
3   NaN  NaN     NaN
4   6.0  6.0     6.0
5   NaN  7.0     7.0

score 2 · Answer 3 · answered Oct 16 '19 at 09:22

2

If it's guaranteed that A & B are identical values when not nans, then it looks like you could use .combine_first here:

df['C'] = df.A.combine_first(df.B)

answered Oct 16 '19 at 09:22

Jon Clements

138,671
33
247
280

score 1 · Answer 4 · answered Oct 16 '19 at 09:32

1

I think fillna is sufficent for your requirement

df['C'] = df.A.fillna(df.B)

Out[92]:
     A    B    C
0  3.0  NaN  3.0
1  4.0  4.0  4.0
2  5.0  NaN  5.0
3  NaN  NaN  NaN
4  6.0  6.0  6.0
5  7.0  7.0  7.0

answered Oct 16 '19 at 09:32

Andy L.

24,909
4
17
29

Pandas check if two columns are the same

4 Answers4