6

Say I have a vector of peoples' names in my dataframe:

names <- c("Bernice Ingram", "Dianna Dean", "Philip Williamson", "Laurie Abbott",
           "Rochelle Price", "Arturo Fisher", "Enrique Newton", "Sarah Mann",
           "Darryl Graham", "Arthur Hoffman")

I want to create a vector with the first names. All I know about them is that they come first in the vector above and that they're followed by a space. In other words, this is what I'm looking for:

"Bernice" "Dianna"  "Philip" "Laurie" "Rochelle"
"Arturo"  "Enrique" "Sarah"  "Darryl" "Arthur"

I've found a similar question here, but the answers (especially this one) haven't helped much. So far, I've tried a couple of variations of function from the grep family, and the closest I could get to something useful was by running strsplit(names, " ") to separate first names and then strsplit(names, " ")[[1]][1] to get just the first name of the first person. I've been trying to tweak this last command to give me a whole vector of first names, to no avail.

Community
  • 1
  • 1
Waldir Leoncio
  • 10,853
  • 19
  • 77
  • 107
  • This assumes that people have first names. [That is wrong](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/). – Raedwald Oct 01 '19 at 10:21
  • See also https://stackoverflow.com/questions/1122328/first-name-middle-name-last-name-why-not-full-name – Raedwald Oct 01 '19 at 10:22

4 Answers4

11

Use sapply to extract the first name:

> sapply(strsplit(names, " "), `[`, 1)
 [1] "Bernice"  "Dianna"   "Philip"   "Laurie"   "Rochelle" "Arturo"   "Enrique" 
 [8] "Sarah"    "Darryl"   "Arthur"

Some comments:

The above works just fine. To make it a bit more general you could change the split parameter in strsplit function from " " in "\\s+" which covers multiple spaces. Then you also could use gsub to extract directly everything before a space. This last approach will use only one function call and likely to be faster (but I haven't check with benchmark).

Michele
  • 8,563
  • 6
  • 45
  • 72
5

For what you want, here's a pretty unorthodox way to do it:

read.table(text = names, header = FALSE, stringsAsFactors=FALSE, fill = TRUE)[[1]]
# [1] "Bernice"  "Dianna"   "Philip"   "Laurie"   "Rochelle" "Arturo"   "Enrique"  "Sarah"   
# [9] "Darryl"   "Arthur"  
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
3

This seems to work:

unlist(strsplit(names,' '))[seq(1,2*length(names),2)]

Assuming no first/last names have spaces in them.

zzxx53
  • 413
  • 3
  • 12
3

Using regexpr on gsub

> gsub("^(.*?)\\s.*", "\\1", names)
 [1] "Bernice"  "Dianna"   "Philip"   "Laurie"   "Rochelle" "Arturo"   "Enrique"  "Sarah"   
 [9] "Darryl"   "Arthur"  
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138