0

I have a DataFrame with two columns, where I want to create a third column based on the values of these two columns. That is, the third column should say original if the values in col_a equal the values in col_b and replica otherwise.

Example:

col_a col_b 
1234  1234  
1235  1234  
1236  1234  
1237  1234  
1321  1321  

Expected Outcome:

col_a col_b type
1234  1234  original
1235  1234  replica
1236  1234  replica
1431  1431  original
1321  1431  replica

I tried the following code, but it doesn't seem to work.

type = []

for x in df['col_a'] and y in df['col_b']:
    if x == y:
        type.append('original') 
    else:
        type.append('replica') 
        
df['type'] = [type]        

I am a newbie in Python, so I might be overlooking some crucial basic steps.

Abir
  • 57
  • 5

2 Answers2

1

Use numpy.where:

import numpy as np
df['type'] = np.where(df['col_a'].eq(df['col_b']), 'original', 'replica')

output:

   col_a  col_b      type
0   1234   1234  original
1   1235   1234   replica
2   1236   1234   replica
3   1237   1234   replica
4   1321   1321  original
mozway
  • 194,879
  • 13
  • 39
  • 75
0

This is a possible solution. We use .loc to match the rows that met a condition (say, df["col_a"] == df["col_b"]) and apply the value "original" on the column "type":

equal_rows = df["col_a"] == df["col_b"]
df.loc[equal_rows, "type"] = "original"
df.loc[~equal_rows, "type"] = "replica" # "Not equal rows" = Rows that are different
aaossa
  • 3,763
  • 2
  • 21
  • 34
  • 1
    this works, but it's too bad to perform the (almost) same comparison twice ;) – mozway Mar 21 '22 at 15:43
  • Changed it to store the comparison in a Series and then use the series to index the dataframe, but it still requires two lookups – aaossa Mar 21 '22 at 15:50