0

I have a column with names where the surnames are all upper case and the first names are all in lower case except the first letter. How can I split this up? Example: BIDEN Joe

names <- c("BIDEN Joe", "DE WEERDT Jan", "SCHEPERS Caro")

The result I want to achieve is to create to vectors/columns with in one the words with the capital letters so it becomes:

surnames <- c("BIDEN", "DE WEERDT", "SCHEPERS")

And in the other the first names:

first_names <- c("Joe", "Jan", "Caro")

Thank in advance

user438383
  • 5,716
  • 8
  • 28
  • 43
JoGa
  • 3
  • 2
  • 2
    It's easier to help you if you provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. It's hard to extrapolate from one example. Do any of the surnames or first names have additional spaces? – MrFlick Jun 28 '22 at 14:21
  • Okay, thank you for the hints. I have added some extra examples in the question. – JoGa Jun 28 '22 at 14:22
  • I have especially difficulties with the surnames consisting of two parts seperated by a space. – JoGa Jun 28 '22 at 14:27

2 Answers2

0

Try this:

names <- c("BIDEN Joe", "DE WEERDT Jan", "SCHEPERS Caro")

# Remove capitals followed by a space 
first_names  <- gsub("^[A-Z].+ ", "", names) 
#  "Joe"  "Jan"  "Caro"

# Replace a space followed by a capital followed by a lower case letter
last_names  <- gsub(" [A-Z][a-z].+$", "", names) 
# "BIDEN"     "DE WEERDT" "SCHEPERS"

Also I wouldn't call the vector names as that is the name of a base function.

SamR
  • 8,826
  • 3
  • 11
  • 33
0

You can use capture groups to split the string. For example

names <- c("BIDEN Joe", "DE WEERDT Jan", "SCHEPERS Caro")
m <- regexec("([A-Z ]+) ([A-Z].*)", names, perl=T)
parts <- regmatches(names, m)
parts
# [[1]]
# [1] "BIDEN Joe" "BIDEN"     "Joe"      
# [[2]]
# [1] "DE WEERDT Jan" "DE WEERDT"     "Jan"          
#[[3]]
# [1] "SCHEPERS Caro" "SCHEPERS"      "Caro"

# Last Names
sapply(parts, `[`, 2)
# [1] "BIDEN"     "DE WEERDT" "SCHEPERS" 
# First Names
sapply(parts, `[`, 3)
# [1] "Joe"  "Jan"  "Caro"
MrFlick
  • 195,160
  • 17
  • 277
  • 295