0

I would like to capture the characters between the 1st and 2nd occurrence of '_' in this string:

C2_Sperd20A_XXX_20170301_20170331

That is:

Sperd20A

Thank you

Axeman
  • 32,068
  • 8
  • 81
  • 94
  • can help: https://stackoverflow.com/q/42354232/4137985 ; also this: https://stackoverflow.com/a/23504780/4137985 – Cath Jun 15 '17 at 09:14

2 Answers2

7

We can use sub to match zero or more characters that are not a _ ([^_]*) from the start (^) of the string followed by a _ followed by one or more characters that are not a _ (([^_]+)) capture it as group ((...)) followed by _ and other characters, replace with the backreference (\\1) of the captured group

sub("^[^_]*_([^_]+)_.*", "\\1", str1)
#[1] "Sperd20A"

Or between the 2nd and 3rd _

sub("^([^_]*_){2}([^_]+).*", "\\2", str1)
#[1] "XXX"

Or another option is strsplit

strsplit(str1, "_")[[1]][2]
#[1] "Sperd20A"

If it is between 2nd and 3rd _

strsplit(str1, "_")[[1]][3]
#[1] "XXX"

###data

str1 <- "C2_Sperd20A_XXX_20170301_20170331"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • is there a matching stringr::str_extract syntax to find a string between the 2nd and 3rd _ in this example (like: strsplit(str1, "_")[[1]][3])? – Alex Dometrius Aug 29 '18 at 20:49
  • @AlexDometrius Try with `str_replace` `str_replace(str1, "^[^_]+_[^_]+_([^_]+)_.*", "\\1") #[1] "XXX"` – akrun Aug 29 '18 at 20:54
  • What's the purpose of the first asterisk in the expression `sub("*([^_]*_){2}([^_]+).*", "\\2", str1)`? – Dan May 10 '22 at 09:36
  • 1
    @Lyngbakr i guess it was a typo. I might have intended `^` (updated) – akrun May 10 '22 at 14:58
1

A good option is to use the stringr package:

library(stringr)
s <- "C2_Sperd20A_XXX_20170301_20170331"

# (?<=foo) Lookbehind
# (?=foo) Lookahead
str_extract(string = s, pattern = "(?<=_)(.*?)(?=_)")
[1] "Sperd20A"
Samuel
  • 2,895
  • 4
  • 30
  • 45