Select column based on condition on other column, vectorized, pandas

Question

I would like to create a new column in a dataframe equal to column A if column C = 'a', and column B if column C = 'b'. I have implemented this:

def f(row):
    if row['C'] = 'a':
        return row['A']
    elif row['C'] = 'b':
        return row['B']
    return np.nan

df['new'] = df.apply(f, axis=1)

I feel as though the code runs slowly. The answer here explains that this is not vectorized.

Alternatively:

df.ix[df[C]=='a',df['new']] = df.ix[df[C]=='a',df['A']]
df.ix[df[C]=='b',df['new']] = df.ix[df[C]=='b',df['B']]

Is this vectorized? Is there a different 'correct' way of doing this in pandas? What would a vectorized function do differently?

apply is just a loop and should be avoided where possible, your solution is fine, there are many ways of doing what you want. It depends on the size of the data, your sample code could be simplified: `df.loc[df['C']=='a', 'new'] = df['A']` and likewise for the other condition — EdChum, Sep 19 '14 at 13:54

score 2 · Accepted Answer · answered Jan 16 '15 at 15:59

2

try

df["new"]=np.nan
df["new"][df["C"]=='a']=df["A"]
df["new"][df["C"]=='b']=df["B"]

answered Jan 16 '15 at 15:59

Eric Wang

1,009
1
9
16

I like it as a simplified version of mine. On the other hand I'm nostalgic for Excel's nested 'if' statements. I don't know why! – KieranPC Jan 19 '15 at 19:38

Select column based on condition on other column, vectorized, pandas

1 Answers1