16

I'd like to extract everything after "-" in vector of strings in R.

For example in :

test = c("Pierre-Pomme","Jean-Poire","Michel-Fraise")

I'd like to get

c("Pomme","Poire","Fraise")

Thanks !

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
  • See also: [Extract a substring according to a pattern](https://stackoverflow.com/questions/17215789) – GKi Jun 14 '23 at 06:59

4 Answers4

20

With str_extract. \\b is a zero-length token that matches a word-boundary. This includes any non-word characters:

library(stringr)
str_extract(test, '\\b\\w+$')
# [1] "Pomme"  "Poire"  "Fraise"

We can also use a back reference with sub. \\1 refers to string matched by the first capture group (.+), which is any character one or more times following a - at the end:

sub('.+-(.+)', '\\1', test)
# [1] "Pomme"  "Poire"  "Fraise"

This also works with str_replace if that is already loaded:

library(stringr)
str_replace(test, '.+-(.+)', '\\1')
# [1] "Pomme"  "Poire"  "Fraise"

Third option would be using strsplit and extract the second word from each element of the list (similar to word from @akrun's answer):

sapply(strsplit(test, '-'), `[`, 2)
# [1] "Pomme"  "Poire"  "Fraise"

stringr also has str_split variant to this:

str_split(test, '-', simplify = TRUE)[,2]
# [1] "Pomme"  "Poire"  "Fraise"
acylam
  • 18,231
  • 5
  • 36
  • 45
17

We can use sub to match characters (.*) until the - and in the replacement specify ""

sub(".*-", "", test)

Or another option is word

library(stringr)
word(test, 2, sep="-")
akrun
  • 874,273
  • 37
  • 540
  • 662
5

I think the other answers might be what you're looking for, but if you don't want to lose the original context you can try something like this:

library(tidyverse)

tibble(test) %>% 
    separate(test, c("first", "last"), remove = F)

This will return a dataframe containing the original strings plus components, which might be more useful down the road:

# A tibble: 3 x 3
  test          first  last  
  <chr>         <chr>  <chr> 
1 Pierre-Pomme  Pierre Pomme 
2 Jean-Poire    Jean   Poire 
3 Michel-Fraise Michel Fraise
  • How do you specify where to separate the text if there were more than one "-" in the test column? – ORStudent Sep 08 '20 at 08:06
  • @ORStudent you can try using more complex regex in the `sep` argument. You can also use integers to specify exact positions, which means you can use something like `str_locate_all` to find all occurrences of a separator and then specify which one, exactly, should be separated on. –  Sep 09 '20 at 11:21
0

For some reason the responses here didn't work for my particular string. I found this response more helpful (i.e., using Stringr's lookbehind function): stringr str_extract capture group capturing everything.

rtk19
  • 1
  • 1
    This answer could be more helpful if you could kindly provide a short summary of your reference and a simple showcase. – X Zhang Nov 10 '22 at 00:46