Pandas - split columns with default values if no delimiter present

Question

I have below dataframe and I am trying to split 'name' column into first_name and last_name on the basis of space however for some names there is no delimiter and in such cases I want to take the value in last name and have blank in first name.

One possible way is to iterate over all the rows and use if-else condition over each row however as mentioned in this post.

"Iteration in Pandas is an anti-pattern and is something you should only do when you have exhausted every other option." so I am looking for a way to achieve this in Pandas.

names_df = pd.read_csv(io.BytesIO(obj['Body'].read()))
print(names_df)
names_df[['first_name', 'last_name']] = names_df['name'].str.split(' ', expand=True)
print(names_df)

ValueError: Columns must be same length as key

order_id      name        product_id  product_price
    0        Thanos         Ipad        800
    1        Hulk           AC          400
    2        C America      Ipad        760
    3        Black Panther  IPhone      1100

Expected Dataframe:

first_name   last_name   
              Thanos
              Hulk
   C          America
  Black       Panther

Which version of pandas are you using? For me the code seems to work, however everything is added to the first_name column and only the last two have a last name. (pandas version 1.0.0) — ScootCork, Jun 20 '20 at 14:13

Dishin H Goyani · Answer 1 · 2020-06-20T16:13:52.730

4

First split and then reverse internal list using str[::-1]

df[['last_name','first_name']] = df.name.str.split().str[::-1].apply(pd.Series).fillna('')

df
            name last_name first_name
0         Thanos    Thanos        
1           Hulk      Hulk        
2      C America   America          C
3  Black Panther   Panther      Black

edited Jun 20 '20 at 16:13

answered Jun 20 '20 at 14:08

Dishin H Goyani

7,195
3
26
37

score 3 · Accepted Answer · answered Jun 20 '20 at 14:19

Here is the solution I was able to come up with, not sure if it is the most optimal one but it works

df = pd.DataFrame({'name': ['Thanos', 'Hulk', 'Black Panther', 'C Amarica']})                                    

def split_name(name): 
    split_name = name.split(' ') 
    return split_name if len(split_name) == 2 else ['', split_name[0]]

pd.DataFrame(df.name.apply(split_name).tolist(), columns=['first_name', 'last_name'])

  first_name last_name
0               Thanos
1                 Hulk
2      Black   Panther
3          C   Amarica

Shubham Sharma · Answer 3 · 2020-06-20T14:14:11.717

1

Use, Series.str.extract along with the named regex capturing groups to extract the first and last name from the name column:

df1 = names_df['name'].str.extract(r'(?P<First_Name>\w+)\s(?P<Last_Name>\w+)')
df1['Last_Name'] = df1['Last_Name'].fillna(names_df['name'])

# print(df1)
  First_Name Last_Name
0        NaN    Thanos
1        NaN      Hulk
2          C   America
3      Black   Panther

edited Jun 20 '20 at 14:14

answered Jun 20 '20 at 14:06

Shubham Sharma

68,127
6
24
53

Pandas - split columns with default values if no delimiter present

3 Answers3