1

I am trying to generate a third column in pandas dataframe using two other columns in dataframe. The requirement is very particular to the scenario for which I need to generate the third column data.

The requirement is stated as:

let the dataframe name be df, first column be 'first_name'. second column be 'last_name'. I need to generate third column in such a manner so that it uses string formatting to generate a particular string and pass it to a function and whatever the function returns should be used as value to third column.

Problem 1

base_string = "my name is {first} {last}"

df['summary'] = base_string.format(first=df['first_name'], last=df['last_name'])

Problem 2

df['summary'] = some_func(base_string.format(first=df['first_name'], last=df['last_name']))

My ultimate goal is to solve problem 2 but for that problem 1 is pre-requisite and as of now I'm unable to solve that. I have tried converting my dataframe values to string but it is not working the way I expected.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Anand
  • 361
  • 1
  • 9
  • 23

2 Answers2

2

You can do apply:

df.apply(lambda r: base_string.format(first=r['first_name'], last=r['last_name']) ),
         axis=1)

Or list comprehension:

df['summary'] = [base_string.format(first=x,last=y) 
                 for x,y in zip(df['first_name'], df['last_name'])

And then, for general function some_func:

df['summary'] = [some_func(base_string.format(first=x,last=y) )
                 for x,y in zip(df['first_name'], df['last_name'])
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

You could use pandas.DataFrame.apply with axis=1 so your code will look like this:

def mapping_function(row):
    #make your calculation
    return value
df['summary'] = df.apply(mapping_function, axis=1)
Zeno Dalla Valle
  • 957
  • 5
  • 16
  • 1
    It works, but I think passing the entire row to a custom function would be highly inefficient. Would be best to send only the columns necessary. – Caio Castro Mar 09 '21 at 21:08
  • 1
    Look at this question maybe np.vectorize could work for you https://stackoverflow.com/questions/52673285/performance-of-pandas-apply-vs-np-vectorize-to-create-new-column-from-existing-c – Zeno Dalla Valle Mar 09 '21 at 21:15
  • Never looked into np.vectorize and my code is plagued with slow.apply(custom_funcs). Really great tip, will definitely checkout :) – Caio Castro Mar 09 '21 at 21:24
  • Yeah, I found it out just now. It's useful for some of my python notebook too. Cheers. – Zeno Dalla Valle Mar 09 '21 at 21:26
  • 1
    Note that `np.vectorize` is **not** vecterization by any mean. It is just a wrapped `for` loop. – Quang Hoang Mar 10 '21 at 04:54