R - Get substring between first occurrence and last occurrence

Question

I'm working with long strings in R such as:

string <- "end of section. 3. LESSONS. The previous LESSONS are very important as seen in Figure 1. This text is also important. Figure 1: Blah blah blah".

I would like to extract the substring between the first occurrence of 'LESSONS' and the last occurrence of 'Figure 1', as follows:

"The previous LESSONS are very important as seen in Figure 1. This text is also important."

I tried the following but it returns the substring after the last occurence of 'LESSONS', not the first:

gsub(".*LESSONS (.*) Figure 1.*", "\\1", string)
#[1] "are very important as seen in Figure 1. This text is also important."

Also tried the following but it cuts the string after the first occurrence of 'Figure 1', not the last:

library(qdapRegex)
ex_between(string, "LESSONS", "Figure 1")
#[[1]]
#[1] ". The previous LESSONS are very important as seen in"

I'd appreciate any help!

Ronak Shah · Accepted Answer · 2020-07-09T06:03:49.187

0

You were very close. Make the regex non-greedy at the before "LESSONS" so that it matches the first one.

Also, here you can use only sub instead of gsub.

sub(".*?LESSONS\\.\\s*(.*) Figure 1.*", "\\1", string)
#[1] "The previous LESSONS are very important as seen in Figure 1. This text is also important."

edited Jul 09 '20 at 06:03

answered Jul 09 '20 at 06:00

Ronak Shah

377,200
20
156
213

You're the best, Ronak! Thanks! – Oliver Peña-Habib Jul 09 '20 at 06:03
Done! Thank you so much. – Oliver Peña-Habib Jul 09 '20 at 16:42

score 0 · Answer 2 · answered Jul 09 '20 at 06:08

0

You can use str_extract from the package stringr as well as positive lookbehind in (?<=...)and positive lookahead in (?=...) to define those parts of the string that delimit the part you want to extract:

str_extract(string, "(?<=LESSONS\\.\\s).*(?=\\sFigure 1)")
[1] "The previous LESSONS are very important as seen in Figure 1. This text is also important."

answered Jul 09 '20 at 06:08

Chris Ruehlemann

20,321
4
12
34

Thank you! I know `stringr` but had not used those delimiters, this was super helpful. – Oliver Peña-Habib Jul 09 '20 at 16:46
Please consider upvoting if you feel this answer was useful for you. – Chris Ruehlemann Jul 09 '20 at 21:59

R - Get substring between first occurrence and last occurrence

2 Answers2