Split a column in df by another column value

Question

In python, I have the following df (headers in first row):

FullName          FirstName
'MichaelJordan'   'Michael'
'KobeBryant'      'Kobe'
'LeBronJames'     'LeBron'

I am trying to split each record in "FullName" based on the value in "FirstName" but am not having luck...

This is what I tried:

df['Names'] = df['FullName'].str.split(df['FirstName'])

Which produces error:

'Series' objects are mutable, thus they cannot be hashed

Desired output:

print(df['Names'])

['Michael', 'Jordan']
['Kobe', 'Bryant']
['LeBron', 'James']

https://stackoverflow.com/questions/29700552/series-objects-are-mutable-and-cannot-be-hashed-error — Pygirl, Feb 21 '20 at 20:35

piRSquared · Accepted Answer · 2020-02-21T20:47:00.273

5

`str.replace`

lastnames = [full.replace(first, '') for full, first in zip(df.FullName, df.FirstName)]
df.assign(LastName=lastnames)

        FullName FirstName LastName
0  MichaelJordan   Michael   Jordan
1     KobeBryant      Kobe   Bryant
2    LeBronJames    LeBron    James

Same exact idea but using map

df.assign(LastName=[*map(lambda a, b: a.replace(b, ''), df.FullName, df.FirstName)])

        FullName FirstName LastName
0  MichaelJordan   Michael   Jordan
1     KobeBryant      Kobe   Bryant
2    LeBronJames    LeBron    James

edited Feb 21 '20 at 20:47

answered Feb 21 '20 at 20:40

piRSquared

285,575
57
475
624

1

This is exactly what I was looking for. Thank you! – Andrew Vitek Feb 21 '20 at 20:44

score 3 · Answer 2 · answered Feb 21 '20 at 20:41

since you are making row wise operations we can use apply,

the idea is is to replace the first name with it self + a comma to split it by

df["SplitName"] = df.apply(
    lambda x: x["FullName"].replace(x["FirstName"], f"{x['FirstName']}, "), axis=1
)


print(df['SplitName'].str.split(',',expand=True))

         0        1
0  Michael   Jordan
1     Kobe   Bryant
2   LeBron    James

score 3 · Answer 3 · answered Feb 21 '20 at 20:50

3

>>> df.assign(names=[[firstname, fullname[len(firstname):]] 
                     for fullname, firstname in df[['FullName', 'FirstName']].values])
        FullName FirstName              names
0  MichaelJordan   Michael  [Michael, Jordan]
1     KobeBryant      Kobe     [Kobe, Bryant]
2    LeBronJames    LeBron    [LeBron, James]

answered Feb 21 '20 at 20:50

Alexander

105,104
32
201
196

1

Clever... given that first names are typically... uh.. first (-: – piRSquared Feb 21 '20 at 20:51

score 0 · Answer 4 · answered Feb 21 '20 at 20:46

This is oneliner with an apply. Split the FullName on the length of the FirstName:

df['Names'] = df.apply(lambda row: [row['FullName'][:len(row['FirstName'])], row['FullName'][len(row['FirstName']):]] if row['FullName'].startswith(row['FirstName']) else '', axis=1)

        FullName FirstName              Names
0  MichaelJordan   Michael  [Michael, Jordan]
1     KobeBryant      Kobe     [Kobe, Bryant]
2    LeBronJames    LeBron    [LeBron, James]

Split a column in df by another column value

4 Answers4

str.replace

`str.replace`