4

I really love the pandas.assign() function, especially in combination with the lambda expression. However, I ran into an unknown behavior when dealing with string concatenation that I don't understand yet. I have found this thread, but it does not answer my question: String concatenation of two pandas columns

Minimal working example of my problem:

import pandas as pd
df = pd.DataFrame({'Firstname': ['Sandy', 'Peter', 'Dolly'],
                   'Surname': ['Sunshine', 'Parker', 'Dumb']})

which returns

  Firstname   Surname
0     Sandy  Sunshine
1     Peter    Parker
2     Dolly      Dumb

Now, if I'd like to assign e.g. Full Name I thought I could simply do:

df = df.assign(**{'Full Name': lambda x: f'{x.Firstname} {x.Surname}'})

but this does not just create a new string like "Sandy Sunshine" based on each row (as expected) but on all rows like this:

weird_pandas_assign_behavior

Could anyone explain me why my approach does not work and why this

df = df.assign(**{'Full Name': lambda x: x.Firstname + ' ' + x.Surname})

obviously works? Thank you :)

Ch3steR
  • 20,090
  • 4
  • 28
  • 58

3 Answers3

4
df.assign(**{'Full Name': lambda x: f'{x.Firstname} {x.Surname}'})

That's where you are doing wrong.

f-strings keep whatever that is processed in the {} to the string. Example:

print(f"Hello {df} world")
hello  0    Sandy
1    Peter
2    Dolly
Name: Firstname, dtype: object world

So, the output of f'{x.Firstname} {x.Surname}' would be

0    Sandy
1    Peter
2    Dolly
Name: Firstname, dtype: object 0    Sunshine
1      Parker
2        Dumb
Name: Surname, dtype: object

Now df.assign(new_col = 'a') would output:

 Firstname   Surname new_col
0     Sandy  Sunshine       a
1     Peter    Parker       a
2     Dolly      Dumb       a

That's the reason why you got the below string in every row.

0    Sandy
1    Peter
2    Dolly
Name: Firstname, dtype: object 0    Sunshine
1      Parker
2        Dumb
Name: Surname, dtype: object

In second case:

df.assign(**{'Full Name': lambda x: x.Firstname + ' ' + x.Surname})

Equivalent to

df.assign(Full_name = df['Firstname'] + ' ' + df['Surname']

It' just string concatenation element-wise so it worked as intended.

You can use pd.Series.str.cat here.

df['Full Name'] = df['Firstname'].str.cat(df['Surname'],sep=' ')
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
2

The result of

f'{df.Firstname} {df.Surname}'

has type str and is the string representation of a pandas series while the type of

df.Firstname + ' ' + df.Surname

is pandas.core.series.Series. Because of this the assigment is treated differently.

JoergVanAken
  • 1,286
  • 9
  • 10
1

In pandas f-strings working elementwise for combine strings, because not exist solution implemented for arrays.

So in your solution are combine all Series (columns in df).

If need use f-strings one possible solution is loop in zipped columns pro processing each pair separately:

df = df.assign(**{'Full Name': lambda x: [f'{Firstname} {Surname}' 
                                         for Firstname, Surname in 
                                         zip(x['Firstname'], x['Surname'])]})
print (df)
  Firstname   Surname       Full Name
0     Sandy  Sunshine  Sandy Sunshine
1     Peter    Parker    Peter Parker
2     Dolly      Dumb      Dolly Dumb
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252