Pandas dataframe apply lambda based on inputs from multiple columns

Question

Let's say I have a dataframe that looks like this:

How can I apply lambda on the dataframe to make FullName = FirstName + ' ' + LastName? As far as I know lambda in dataframes has 1 input only? Thanks!

score 6 · Accepted Answer · answered Dec 05 '19 at 06:27

6

I think apply here is not neccesary, only join columns together with +:

df['FullName'] = df.FirstName + ' ' + df.LastName

Or use Series.str.cat:

df['FullName'] = df.FirstName.str.cat(df.LastName, sep=' ')

Solution with lambda is possible, but slow:

df['FullName'] = df.apply(lambda x: x.FirstName + ' ' + x.LastName, axis=1)

answered Dec 05 '19 at 06:27

jezrael

822,522
95
1,334
1,252

Perfect! That worked :-) unfortunately I must use lambda as the problem I have is more complex (has if statement). I've posted that example for simplicity. A follow up question regarding speed...is lambda slow in general or in this specific case (when applied on dataframes)? – Chadee Fouad Dec 05 '19 at 06:59
1

@ShadyMBA - It depends of formula. But generally is use `apply` it is slow, because loops under the hood. – jezrael Dec 05 '19 at 07:11
So is there a way to do an if statement on a column without using lamda? The example you gave df['FullName'] = df.FirstName + ' ' + df.LastName is very elegant but I don't see a way to do it in case I need to use an if statement? – Chadee Fouad Dec 05 '19 at 07:23
1

@ShadyMBA - then use [this](https://stackoverflow.com/questions/19913659) – jezrael Dec 05 '19 at 07:24
So I've applied your method and this works properly: df['GlobalName'][df['GlobalName']==''] = df.apply(lambda x: x['CleanName'] if x['IsPerson'] == True else '', axis = 1) However I've heard that List Comprehension is 50% faster so I was trying to translate the above code to the List Comprehension method but I'm getting an error "SyntaxError: invalid syntax"...any suggestions to fix the error? Here's my attempted code: df['GlobalName'][df['GlobalName']==''] = [x['CleanName'] if x['IsPerson'] == True else '' for x in df, axis = 1] Thanks! – Chadee Fouad Dec 06 '19 at 04:53
1

@ShadyMBA - Can you test `df['GlobalName'] = np.where((df['GlobalName']=='') & (df['IsPerson']), df['CleanName'], '')` ? – jezrael Dec 06 '19 at 06:16
Your code made the entire column = ''. But with a little tweak it worked as I wanted: df['GlobalName'][df['GlobalName']==''] = np.where(df['IsPerson']==True, df['CleanName'], '') - basically apply the formula ONLY on empty rows...I think your code was overwriting existing cells. Your second solution is much more elegant than the lambda one and on top of that it is faster!! You're a genius. Thanks! – Chadee Fouad Dec 06 '19 at 07:39
@ShadyMBA - You are welcome! Thanks, glad to help. Don't forget to accept the answer, if it suits you! :) – jezrael Dec 06 '19 at 07:39
1

3 solutions: `%timeit df['GlobalName'][df['GlobalName']==''] = df.apply(lambda x: x['CleanName'] if x['IsPerson'] == True else '', axis = 1)` `%timeit df['GlobalName'][df['GlobalName']==''] = np.where(df['IsPerson']==True, df['CleanName'], '')` `%timeit df['GlobalName'] = np.where((df['IsPerson']==True) & (df['GlobalName']==''), df['CleanName'], df['GlobalName'])` 39.6 ms 40.6 ms 1.21 ms The 3rd solution is super fast compared to the other 2. – Chadee Fouad Dec 06 '19 at 07:59

Pandas dataframe apply lambda based on inputs from multiple columns

1 Answers1