Regex: Capture everything between two strings except those strings

Question

I have the following texts:

This text appears at p.UNWANTED_TEXT72

This text appears between pp.UNWANTED_TEXT12-14

I want to select text between p. and first digit occurred and remove it:

Here is what I want:

This text appears at p.72.

This text appears between pp.12-14.

The following expression captures boundaries as well:

p\.(.*?)\d

How can I exclude the boundaries from selection?

There are many different flavors of Regex. That is why the tool/programming language should be in the question (as explained if you hover over the regex tag). I don't know R so I'm not sure if it allows look-arounds, but you could try `(?<=p\.)(.*?)(?=\d)`. If that works it's a dupe of https://stackoverflow.com/questions/6109882/regex-match-all-characters-between-two-strings — Ivar, Jan 13 '19 at 01:23
It matters because the different regex engines have different feature sets. Specifically for your case one could use zero-width look-behind assertions, but that is not supported by all dialects. If these are not supported but look-ahead is, you can still do just the look-ahead for the digit and replace the matched occurences with a `p.`. — Lucero, Jan 13 '19 at 01:23
R is using ERE (default) or Perl regexes. What are you using? — hek2mgl, Jan 13 '19 at 01:29
Yes, it looks like it depends on the program. When I use `gsub` function in R with `perl=TRUE`, @Ivar 's solution worked, otherwise not. Working code in R: `gsub(pattern = "(?<=p\\.)(.*?)(?=\\d)", replacement = "", x = "This text appears at p.UNWANTED_TEXT72", perl = TRUE)`. Thanks. — HBat, Jan 13 '19 at 01:34

score 2 · Accepted Answer · answered Jan 13 '19 at 02:07

2

answered Jan 13 '19 at 02:07

xehpuk

1 Answers1