3

I'm comparing two columns in a dataframe (A & B). I have a method that works (C5). It came from this question: Compare two columns using pandas

I wondered why I couldn't get the other methods (C1 - C4) to give the correct answer:

df = pd.DataFrame({'A': [1,1,1,1,1,2,2,2,2,2],
                   'B': [1,1,1,1,1,1,0,0,0,0]})

#df['C1'] = 1 [df['A'] == df['B']]

df['C2'] = df['A'].equals(df['B'])

df['C3'] = np.where((df['A'] == df['B']),0,1)

def fun(row):
    if ['A'] == ['B']:
        return 1
    else:
        return 0
df['C4'] = df.apply(fun, axis=1)

df['C5'] = df.apply(lambda x : 1 if x['A'] == x['B'] else 0, axis=1)

enter image description here

R. Cox
  • 819
  • 8
  • 25

2 Answers2

3

IIUC you need this:

def fun(row):
if row['A'] == row['B']:
    return 1
else:
    return 0
anky
  • 74,114
  • 11
  • 41
  • 70
3

Use:

df = pd.DataFrame({'A': [1,1,1,1,1,2,2,2,2,2],
                   'B': [1,1,1,1,1,1,0,0,0,0]})

So for C1 and C2 need compare columns by == or eq for boolean mask and then convert it to integers - True, False to 1,0:

df['C1'] = (df['A'] == df['B']).astype(int)
df['C2'] = df['A'].eq(df['B']).astype(int)

Here is necessary change order 1,0 - for match condition need 1:

df['C3'] = np.where((df['A'] == df['B']),1,0)

In function is not selected values of Series, missing row:

def fun(row):
    if row['A'] == row['B']:
        return 1
    else:
        return 0
df['C4'] = df.apply(fun, axis=1)

Solution is correct:

df['C5'] = df.apply(lambda x : 1 if x['A'] == x['B'] else 0, axis=1)
print (df)
   A  B  C1  C2  C3  C4  C5
0  1  1   1   1   1   1   1
1  1  1   1   1   1   1   1
2  1  1   1   1   1   1   1
3  1  1   1   1   1   1   1
4  1  1   1   1   1   1   1
5  2  1   0   0   0   0   0
6  2  0   0   0   0   0   0
7  2  0   0   0   0   0   0
8  2  0   0   0   0   0   0
9  2  0   0   0   0   0   0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252