How to create a new column in a DataFrame based on values of two other columns

Question

I have a DataFrame with two columns, where I want to create a third column based on the values of these two columns. That is, the third column should say original if the values in col_a equal the values in col_b and replica otherwise.

Example:

col_a col_b 
1234  1234  
1235  1234  
1236  1234  
1237  1234  
1321  1321

Expected Outcome:

col_a col_b type
1234  1234  original
1235  1234  replica
1236  1234  replica
1431  1431  original
1321  1431  replica

I tried the following code, but it doesn't seem to work.

type = []

for x in df['col_a'] and y in df['col_b']:
    if x == y:
        type.append('original') 
    else:
        type.append('replica') 
        
df['type'] = [type]

I am a newbie in Python, so I might be overlooking some crucial basic steps.

Try `for x, y in zip(df['col_a'], df['col_b'])` – Gejun Mar 21 '22 at 15:42 — Gejun, Mar 21 '22 at 15:42

score 1 · Accepted Answer · answered Mar 21 '22 at 15:42

Use numpy.where:

import numpy as np
df['type'] = np.where(df['col_a'].eq(df['col_b']), 'original', 'replica')

output:

   col_a  col_b      type
0   1234   1234  original
1   1235   1234   replica
2   1236   1234   replica
3   1237   1234   replica
4   1321   1321  original

aaossa · Answer 2 · 2022-03-21T15:49:47.403

0

This is a possible solution. We use .loc to match the rows that met a condition (say, df["col_a"] == df["col_b"]) and apply the value "original" on the column "type":

equal_rows = df["col_a"] == df["col_b"]
df.loc[equal_rows, "type"] = "original"
df.loc[~equal_rows, "type"] = "replica" # "Not equal rows" = Rows that are different

edited Mar 21 '22 at 15:49

answered Mar 21 '22 at 15:42

aaossa

3,763
2
21
34

1

this works, but it's too bad to perform the (almost) same comparison twice ;) – mozway Mar 21 '22 at 15:43
Changed it to store the comparison in a Series and then use the series to index the dataframe, but it still requires two lookups – aaossa Mar 21 '22 at 15:50

How to create a new column in a DataFrame based on values of two other columns

2 Answers2