How to extract everything until first occurrence of pattern

Question

I'm trying to use the stringr package in R to extract everything from a string up until the first occurrence of an underscore.

What I've tried

str_extract("L0_123_abc", ".+?(?<=_)")
> "L0_"

Close but no cigar. How do I get this one? Also, Ideally I'd like something that's easy to extend so that I can get the information in between the 1st and 2nd underscore and get the information after the 2nd underscore.

score 72 · Accepted Answer · answered Oct 18 '16 at 17:26

To get L0, you may use

> library(stringr)
> str_extract("L0_123_abc", "[^_]+")
[1] "L0"

The [^_]+ matches 1 or more chars other than _.

Also, you may split the string with _:

x <- str_split("L0_123_abc", fixed("_"))
> x
[[1]]
[1] "L0"  "123" "abc"

This way, you will have all the substrings you need.

The same can be achieved with

> str_extract_all("L0_123_abc", "[^_]+")
[[1]]
[1] "L0"  "123" "abc"

score 12 · Answer 2 · answered Oct 18 '16 at 16:58

12

The regex lookaround should be

str_extract("L0_123_abc", ".+?(?=_)")
#[1] "L0"

answered Oct 18 '16 at 16:58

akrun

874,273
37
540
662

What does the first `?` do in this pattern? – its.me.adam Feb 15 '23 at 18:07
it is needed as without it I get `"L0_123"` – its.me.adam Feb 15 '23 at 18:40
@its.me.adam Sorry, didnt test it before commenting. It is a greedy match. You can check [here](https://www.regular-expressions.info/optional.html) – akrun Feb 15 '23 at 19:11
So the `?` has the pattern match both `any character` and `any character at least once`. But I don't understand how this prevents matching`"L0_123"`. – its.me.adam Feb 15 '23 at 19:24

score 9 · Answer 3 · answered Oct 18 '16 at 16:57

9

Using gsub...

gsub("(.+?)(\\_.*)", "\\1", "L0_123_abc")

answered Oct 18 '16 at 16:57

jmartindill

260
1
8

GKi · Answer 4 · 2020-06-18T09:08:43.920

7

You can use sub from base using _.* taking everything starting from _.

sub("_.*", "", "L0_123_abc")
#[1] "L0"

Or using [^_] what is everything but not _.

sub("([^_]*).*", "\\1", "L0_123_abc")
#[1] "L0"

or using substr with regexpr.

substr("L0_123_abc", 1, regexpr("_", "L0_123_abc")-1)
#substr("L0_123_abc", 1, regexpr("_", "L0_123_abc", fixed=TRUE)-1) #More performant alternative
#[1] "L0"

edited Jun 18 '20 at 09:08

answered Jun 17 '20 at 11:43

GKi

37,245
2
26
48

How to extract everything until first occurrence of pattern

4 Answers4

Linked

Related