Extracting first names in R

Question

Say I have a vector of peoples' names in my dataframe:

names <- c("Bernice Ingram", "Dianna Dean", "Philip Williamson", "Laurie Abbott",
           "Rochelle Price", "Arturo Fisher", "Enrique Newton", "Sarah Mann",
           "Darryl Graham", "Arthur Hoffman")

I want to create a vector with the first names. All I know about them is that they come first in the vector above and that they're followed by a space. In other words, this is what I'm looking for:

"Bernice" "Dianna"  "Philip" "Laurie" "Rochelle"
"Arturo"  "Enrique" "Sarah"  "Darryl" "Arthur"

I've found a similar question here, but the answers (especially this one) haven't helped much. So far, I've tried a couple of variations of function from the grep family, and the closest I could get to something useful was by running strsplit(names, " ") to separate first names and then strsplit(names, " ")[[1]][1] to get just the first name of the first person. I've been trying to tweak this last command to give me a whole vector of first names, to no avail.

This assumes that people have first names. [That is wrong](https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/). — Raedwald, Oct 01 '19 at 10:21
See also https://stackoverflow.com/questions/1122328/first-name-middle-name-last-name-why-not-full-name — Raedwald, Oct 01 '19 at 10:22

Michele · Accepted Answer · 2013-10-11T15:28:35.257

Use sapply to extract the first name:

> sapply(strsplit(names, " "), `[`, 1)
 [1] "Bernice"  "Dianna"   "Philip"   "Laurie"   "Rochelle" "Arturo"   "Enrique" 
 [8] "Sarah"    "Darryl"   "Arthur"

Some comments:

The above works just fine. To make it a bit more general you could change the split parameter in strsplit function from " " in "\\s+" which covers multiple spaces. Then you also could use gsub to extract directly everything before a space. This last approach will use only one function call and likely to be faster (but I haven't check with benchmark).

A5C1D2H2I1M1N2O1R2T1 · Answer 2 · 2013-10-11T17:29:12.400

5

For what you want, here's a pretty unorthodox way to do it:

read.table(text = names, header = FALSE, stringsAsFactors=FALSE, fill = TRUE)[[1]]
# [1] "Bernice"  "Dianna"   "Philip"   "Laurie"   "Rochelle" "Arturo"   "Enrique"  "Sarah"   
# [9] "Darryl"   "Arthur"

edited Oct 11 '13 at 17:29

answered Oct 11 '13 at 16:53

A5C1D2H2I1M1N2O1R2T1

190,393
28
405
485

nice! and in case of someone having a second name I'd suggest to set `fill=T` :) – Michele Oct 11 '13 at 17:27
@Michele, Thanks. I *had* intended to do that, but forgot to do so when posting. Will update now. – A5C1D2H2I1M1N2O1R2T1 Oct 11 '13 at 17:28

score 3 · Answer 3 · answered Oct 11 '13 at 15:25

3

This seems to work:

unlist(strsplit(names,' '))[seq(1,2*length(names),2)]

Assuming no first/last names have spaces in them.

answered Oct 11 '13 at 15:25

zzxx53

413
3
12

score 3 · Answer 4 · answered Oct 11 '13 at 15:26

3

Using regexpr on gsub

> gsub("^(.*?)\\s.*", "\\1", names)
 [1] "Bernice"  "Dianna"   "Philip"   "Laurie"   "Rochelle" "Arturo"   "Enrique"  "Sarah"   
 [9] "Darryl"   "Arthur"

answered Oct 11 '13 at 15:26

Jilber Urbina

58,147
10
114
138

2

or `sub(' .*', '', names)` – eddi Oct 11 '13 at 15:52

Extracting first names in R

4 Answers4

Linked