I'd like to extract everything after "-" in vector of strings in R.
For example in :
test = c("Pierre-Pomme","Jean-Poire","Michel-Fraise")
I'd like to get
c("Pomme","Poire","Fraise")
Thanks !
I'd like to extract everything after "-" in vector of strings in R.
For example in :
test = c("Pierre-Pomme","Jean-Poire","Michel-Fraise")
I'd like to get
c("Pomme","Poire","Fraise")
Thanks !
With str_extract
. \\b
is a zero-length token that matches a word-boundary. This includes any non-word characters:
library(stringr)
str_extract(test, '\\b\\w+$')
# [1] "Pomme" "Poire" "Fraise"
We can also use a back reference with sub
. \\1
refers to string matched by the first capture group (.+)
, which is any character one or more times following a -
at the end:
sub('.+-(.+)', '\\1', test)
# [1] "Pomme" "Poire" "Fraise"
This also works with str_replace
if that is already loaded:
library(stringr)
str_replace(test, '.+-(.+)', '\\1')
# [1] "Pomme" "Poire" "Fraise"
Third option would be using strsplit
and extract the second word from each element of the list (similar to word
from @akrun's answer):
sapply(strsplit(test, '-'), `[`, 2)
# [1] "Pomme" "Poire" "Fraise"
stringr
also has str_split
variant to this:
str_split(test, '-', simplify = TRUE)[,2]
# [1] "Pomme" "Poire" "Fraise"
We can use sub
to match characters (.*
) until the -
and in the replacement specify ""
sub(".*-", "", test)
Or another option is word
library(stringr)
word(test, 2, sep="-")
I think the other answers might be what you're looking for, but if you don't want to lose the original context you can try something like this:
library(tidyverse)
tibble(test) %>%
separate(test, c("first", "last"), remove = F)
This will return a dataframe containing the original strings plus components, which might be more useful down the road:
# A tibble: 3 x 3
test first last
<chr> <chr> <chr>
1 Pierre-Pomme Pierre Pomme
2 Jean-Poire Jean Poire
3 Michel-Fraise Michel Fraise
For some reason the responses here didn't work for my particular string. I found this response more helpful (i.e., using Stringr's lookbehind function): stringr str_extract capture group capturing everything.