2

I have a imported a csv dataset into python that is being cleaned up, there is no consistency with names some being "John Doe" and others being "Doe, John". I need them to be "First name Last name" without the comma:

Doe, John  
Smith, John 
Snow, John
John Cena 
Steve Smith 

When I want:

 John Doe 
 John Smith 
 John Snow
 John Cena 
 Steve Smith

I tried doing:

if ',' in df['names']:
    df['names'] = ' '. join(df['names'].split(',')[::-1]).strip()

I get

AttributeError: 'Series' object has no attribute 'split'

I have tried making name into a list by doing prior to the code above but that didn't work:

df['name'] = df['name'].to_list()
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • Does this answer your question? [How to split a dataframe string column into two columns?](https://stackoverflow.com/questions/14745022/how-to-split-a-dataframe-string-column-into-two-columns) – mkrieger1 Feb 15 '23 at 17:15
  • `df['names'].str.split(', ').str[::-1].str.join(' ')` – rhug123 Feb 15 '23 at 17:22

2 Answers2

3

You can use str.replace and use capture groups to swap values:

df['names'] = df['names'].str.replace(r'([^,]+),\s*(.+)', r'\2 \1', regex=True)
print(df)

# Output
         names
0     John Doe
1   John Smith
2    John Snow
3    John Cena
4  Steve Smith

Note: you have to use str accessor in your code (but does not solve the next problem):

# Replace
df['names'].split(',')

# With
df['names'].str.split(',')
Corralien
  • 109,409
  • 8
  • 28
  • 52
1

You can use a lambda function to process each name

df['names'] = df['names'].apply(
            lambda x: (x.split(',')[1] + ' ' + x.split(',')[0]).strip() 
            if ',' in x else x
               )

Using split(',') you are splitting the name into two strings, and accessing them with the index [1] part. Then you concatenate [1] with [0] and finally remove leading and trailing whitespaces using strip(). All of this happens if x (remember x is every singular name) contains a comma, if not then we leave x as is.

nahimmedto
  • 25
  • 6