Pandas: how to add a column based on two other columns meeting a certain condition

Question

I have this data in which I have a column that indicates a color and another one that indicates a letter. If the color and the letter 'belong' together, then the data is correct so a new column should state a C. Otherwise, it should state an I.

I did it like this but the thing is, this only puts all the correct ones at the top and the incorrect ones at the bottom:

#correct
c1 = df['color'].eq('green') & df['value'].eq('V')
c2 = df['color'].eq('blue') & df['value'].eq('A')
c3 = df['color'].eq('red') & df['value'].eq('R')
m = c1 | c2 | c3

correct_df = df.loc[m, ['Person ID','word', 'rt', 'color']]

correct_df['accuracy'] = 'C'

incorrect_df = df.loc[~m, ['word', 'rt', 'color']]
incorrect_df['accuracy'] = 'I'

df_cor_inc = correct_df.append([incorrect_df])

What I need is to have instead the other column just be added to the side and say whether the response was correct or not but in the order the data is already in.

This is a sample of the data:

Person ID  value  word    color  correct  rt
0           R     FLOWER  red     r       1223
0           B     CAR     blue    b       33    
1           G     KNIFE   blue    b       333
1           R     CAT     red     r       2332  
2           B     CHILD   green   g       232

This is how I want it to look:

Person ID  value  word    color  correct  rt    accuracy
0           R     FLOWER  red     r       1223  C
0           B     CAR     blue    b       33    C
1           G     KNIFE   blue    b       333   I
1           R     CAT     red     r       2332  C
2           B     CHILD   green   g       232   I

IIUC, use np.where like `df['accuracy'] = np.where(m, 'C','I')` — Ben.T, Oct 07 '21 at 16:44

SeaBean · Accepted Answer · 2021-10-07T16:54:23.920

1

Reusing your boolean mask m, we can use np.where() as follows:

df['accuracy'] = np.where(m, 'C', 'I')

np.where() acts like an if-then-else statement. If the condition in first parameter is True, it will set value according to the second parameter ('C' here); Else, it will set value according to the third parameter ('I' here).

Result:

print(df)

   Person ID value    word  color correct    rt accuracy
0          0     R  FLOWER    red       r  1223        C
1          0     B     CAR   blue       b    33        I
2          1     G   KNIFE   blue       b   333        I
3          1     R     CAT    red       r  2332        C
4          2     B   CHILD  green       g   232        I

edited Oct 07 '21 at 16:54

answered Oct 07 '21 at 16:48

SeaBean

22,547
3
13
25

Why not use list comprehension. Very powerful for manipulating dataframes. df['accuracy'] = ['C' if x.lower() == y else 'I' for (x,y) in zip(df.value, df.correct)] – Esa Tuulari Oct 07 '21 at 17:21
@EsaTuulari Thanks for your question. List comprehension is a good tool and relatively fast. Thus, when we deal with Pandas, sometimes we also use it. However, one drawback of it is on the handling of `NaN` values that is commonly found in Pandas data. The built-in Pandas and Numpy function (like the `np.where()` here, are designed to handle `NaN` values. Another main point is that Pandas and Numpy functions are mostly optimized (e.g. use C rather than Python codes in underlying processing, and also designed for vectorized operations such that array elements are processed in parallel). – SeaBean Oct 07 '21 at 18:28

Pandas: how to add a column based on two other columns meeting a certain condition

1 Answers1