0

I would like to create a new column in a dataframe equal to column A if column C = 'a', and column B if column C = 'b'. I have implemented this:

def f(row):
    if row['C'] = 'a':
        return row['A']
    elif row['C'] = 'b':
        return row['B']
    return np.nan

df['new'] = df.apply(f, axis=1)

I feel as though the code runs slowly. The answer here explains that this is not vectorized.

Alternatively:

df.ix[df[C]=='a',df['new']] = df.ix[df[C]=='a',df['A']]
df.ix[df[C]=='b',df['new']] = df.ix[df[C]=='b',df['B']]

Is this vectorized? Is there a different 'correct' way of doing this in pandas? What would a vectorized function do differently?

Community
  • 1
  • 1
KieranPC
  • 8,525
  • 7
  • 22
  • 25
  • 1
    apply is just a loop and should be avoided where possible, your solution is fine, there are many ways of doing what you want. It depends on the size of the data, your sample code could be simplified: `df.loc[df['C']=='a', 'new'] = df['A']` and likewise for the other condition – EdChum Sep 19 '14 at 13:54

1 Answers1

2

try

df["new"]=np.nan
df["new"][df["C"]=='a']=df["A"]
df["new"][df["C"]=='b']=df["B"]
Eric Wang
  • 1,009
  • 1
  • 9
  • 16
  • I like it as a simplified version of mine. On the other hand I'm nostalgic for Excel's nested 'if' statements. I don't know why! – KieranPC Jan 19 '15 at 19:38