-1

I have the following texts:

This text appears at p.UNWANTED_TEXT72

This text appears between pp.UNWANTED_TEXT12-14

I want to select text between p. and first digit occurred and remove it:

Here is what I want:

This text appears at p.72.

This text appears between pp.12-14.

The following expression captures boundaries as well:

p\.(.*?)\d

How can I exclude the boundaries from selection?

DEMO

Community
  • 1
  • 1
HBat
  • 4,873
  • 4
  • 39
  • 56
  • Which tool/programming language are you using? – Ivar Jan 13 '19 at 01:15
  • R, but does that matter? – HBat Jan 13 '19 at 01:20
  • 1
    There are many different flavors of Regex. That is why the tool/programming language should be in the question (as explained if you hover over the regex tag). I don't know R so I'm not sure if it allows look-arounds, but you could try `(?<=p\.)(.*?)(?=\d)`. If that works it's a dupe of https://stackoverflow.com/questions/6109882/regex-match-all-characters-between-two-strings – Ivar Jan 13 '19 at 01:23
  • 1
    It matters because the different regex engines have different feature sets. Specifically for your case one could use zero-width look-behind assertions, but that is not supported by all dialects. If these are not supported but look-ahead is, you can still do just the look-ahead for the digit and replace the matched occurences with a `p.`. – Lucero Jan 13 '19 at 01:23
  • R is using ERE (default) or Perl regexes. What are you using? – hek2mgl Jan 13 '19 at 01:29
  • Yes, it looks like it depends on the program. When I use `gsub` function in R with `perl=TRUE`, @Ivar 's solution worked, otherwise not. Working code in R: `gsub(pattern = "(?<=p\\.)(.*?)(?=\\d)", replacement = "", x = "This text appears at p.UNWANTED_TEXT72", perl = TRUE)`. Thanks. – HBat Jan 13 '19 at 01:34

1 Answers1

2

You need a positive lookbehind and a negated (shorthand) character class: (?<=p\.)\D+

xehpuk
  • 7,814
  • 3
  • 30
  • 54