2

I have below dataframe and I am trying to split 'name' column into first_name and last_name on the basis of space however for some names there is no delimiter and in such cases I want to take the value in last name and have blank in first name.

One possible way is to iterate over all the rows and use if-else condition over each row however as mentioned in this post.

"Iteration in Pandas is an anti-pattern and is something you should only do when you have exhausted every other option." so I am looking for a way to achieve this in Pandas.

names_df = pd.read_csv(io.BytesIO(obj['Body'].read()))
print(names_df)
names_df[['first_name', 'last_name']] = names_df['name'].str.split(' ', expand=True)
print(names_df)
ValueError: Columns must be same length as key
order_id      name        product_id  product_price
    0        Thanos         Ipad        800
    1        Hulk           AC          400
    2        C America      Ipad        760
    3        Black Panther  IPhone      1100

Expected Dataframe:

first_name   last_name   
              Thanos
              Hulk
   C          America
  Black       Panther
ywbaek
  • 2,971
  • 3
  • 9
  • 28
Explorer
  • 1,491
  • 4
  • 26
  • 67
  • Which version of pandas are you using? For me the code seems to work, however everything is added to the first_name column and only the last two have a last name. (pandas version 1.0.0) – ScootCork Jun 20 '20 at 14:13
  • @ScootCork I am using 1.0.4 – Explorer Jun 20 '20 at 15:43

3 Answers3

4

First split and then reverse internal list using str[::-1]

df[['last_name','first_name']] = df.name.str.split().str[::-1].apply(pd.Series).fillna('')

df
            name last_name first_name
0         Thanos    Thanos        
1           Hulk      Hulk        
2      C America   America          C
3  Black Panther   Panther      Black


Dishin H Goyani
  • 7,195
  • 3
  • 26
  • 37
3

Here is the solution I was able to come up with, not sure if it is the most optimal one but it works

df = pd.DataFrame({'name': ['Thanos', 'Hulk', 'Black Panther', 'C Amarica']})                                    

def split_name(name): 
    split_name = name.split(' ') 
    return split_name if len(split_name) == 2 else ['', split_name[0]]

pd.DataFrame(df.name.apply(split_name).tolist(), columns=['first_name', 'last_name'])

  first_name last_name
0               Thanos
1                 Hulk
2      Black   Panther
3          C   Amarica
1

Use, Series.str.extract along with the named regex capturing groups to extract the first and last name from the name column:

df1 = names_df['name'].str.extract(r'(?P<First_Name>\w+)\s(?P<Last_Name>\w+)')
df1['Last_Name'] = df1['Last_Name'].fillna(names_df['name'])

# print(df1)
  First_Name Last_Name
0        NaN    Thanos
1        NaN      Hulk
2          C   America
3      Black   Panther
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53