Separate "Name" into "FirstName" and "LastName" columns of data frame

Question

I am struggling to figure out how to take a single column of "Name" in a dataframe split it into two other columns of FistName and LastName within the same data frame. The challenge is that some of my Names have several last names. Essentially, I want to take the first word (or element of the string) and put it in the FirstName columns, then put all following text (minus the space of course) into the LastName column.

This is my DataFrame "tteam"

NAME <- c('John Doe','Peter Gynn','Jolie Hope-Douglas', 'Muhammad Arnab Halwai')
TITLE <- c("assistant", "manager", "assistant", "specialist")
tteam<- data.frame(NAME, TITLE)

My desired output would like this:

FirstName <- c("John", "Peter", "Jolie", "Muhammad")
LastName <- c("Doe", "Gynn", "Hope-Douglas", "Arnab Halwai")
tteamdesire <- data.frame(FirstName, LastName, TITLE)

I have tried the following code to create a new data frame of just names that allow me to extract the first names from the first column. However, I am unable to put the last names into any order.

names <- tteam$NAME ##  puts full names into names vector
namesdf <- data.frame(do.call('rbind', strsplit(as.character(names),' ',fixed=TRUE))) 
## splits out all names into a dataframe PROBLEM IS HERE!

I don't think you can get this one correct. Some people have several given names, some have several family names and then there is something called "middle name". — Roland, Oct 21 '14 at 14:37
http://stackoverflow.com/questions/19321673/extracting-first-names-in-r — GSee, Oct 21 '14 at 14:41
I believe this post should help you: http://stackoverflow.com/questions/8299978/splitting-a-string-on-the-first-space — crkatz, Oct 21 '14 at 14:43
Note that in the real world this is a pointless exercise - people have multiple first or last names, put their family names first and familar names second, or third, and I once heard of a Chinese student whose three-part name appeared in the student records database in all 6 possible permutations of A B C. — Spacedman, Oct 21 '14 at 15:23

score 8 · Accepted Answer · answered Oct 21 '14 at 15:10

8

You could use extract from tidyr

 library(tidyr)
 extract(tteam, NAME, c("FirstName", "LastName"), "([^ ]+) (.*)")
 #  FirstName     LastName      TITLE
 #1      John          Doe  assistant
 #2     Peter         Gynn    manager
 #3     Jolie Hope-Douglas  assistant
 #4  Muhammad Arnab Halwai specialist

answered Oct 21 '14 at 15:10

akrun

874,273
37
540
662

Thanks so much for all of the submissions. Chose this answer because of its simplicity. Works now. Thanks again! – RyanL Oct 21 '14 at 18:28

score 5 · Answer 2 · answered Oct 21 '14 at 15:13

5

Try:

> firstname = sapply(strsplit(NAME, ' '), function(x) x[1])
> firstname 
[1] "John"     "Peter"    "Jolie"    "Muhammad"

> lastname = sapply(strsplit(NAME, ' '), function(x) x[length(x)])
> lastname
[1] "Doe"          "Gynn"         "Hope-Douglas" "Halwai"

or:

> ll = strsplit(NAME, ' ')
> 
> firstname = sapply(ll, function(x) x[1])
> lastname = sapply(ll, function(x) x[length(x)])
> 
> firstname
[1] "John"     "Peter"    "Jolie"    "Muhammad"
> lastname
[1] "Doe"          "Gynn"         "Hope-Douglas" "Halwai"

answered Oct 21 '14 at 15:13

rnso

23,686
25
112
234

The last name in the list has a 'middle' name also. To avoid that I used x[length(x)]. – rnso Oct 21 '14 at 15:16
Oops, I didn't see it, but in the tteamdesire, the middle name is also joined with the last name – akrun Oct 21 '14 at 15:17
Consider `str_split(NAME, " ", n=2)` from `stringr` – akrun Oct 21 '14 at 15:19

G. Grothendieck · Answer 3 · 2014-10-21T16:46:29.167

1) sub

data.frame(FirstName = sub(" .*", "", tteam$NAME), 
           LastName = sub("^\\S* ", "", tteam$NAME),
           tteam[-1])

2) gsubfn::read.pattern In the NAME<- we can omit as.character if its already character (as opposed to factor):

library(tteam)

cn <- c("FirstName", "LastName")
NAME <- as.character(tteam$NAME)

cbind( read.pattern(text = NAME, pattern = "^(\\S*) (.*)", col.names = cn), tteam[-1])

Update Update solution to be in terms of tteam and add second solution.

score 2 · Answer 4 · answered Oct 08 '19 at 15:17

You could use the package unglue :

library(unglue)
unglue_unnest(tteam, NAME, "{FirstName} {LastName}")
#>        TITLE FirstName     LastName
#> 1  assistant      John          Doe
#> 2    manager     Peter         Gynn
#> 3  assistant     Jolie Hope-Douglas
#> 4 specialist  Muhammad Arnab Halwai

Separate "Name" into "FirstName" and "LastName" columns of data frame

4 Answers4