0

I'm trying to use regex to extract a string, it works here: https://regexr.com/3vsd4

But when running something similar in R it fails:

m =  "(?<=~* )([ AP_])\\w+"
x = "XY_O ~ R_Z + YPP_L_WINTER + AP_C"
str_match(x, m)[1, 1]

Gives the error:

Error in stri_match_first_regex(string, pattern, opts_regex = opts(pattern)) : Look-Behind pattern matches must have a bounded maximum length. (U_REGEX_LOOK_BEHIND_LIMIT)

How can I edit the R code to return "AP_C"?

Rafael
  • 3,096
  • 1
  • 23
  • 61
  • Possible duplicate of [What's the technical reason for "lookbehind assertion MUST be fixed length" in regex?](https://stackoverflow.com/questions/3796436/whats-the-technical-reason-for-lookbehind-assertion-must-be-fixed-length-in-r) – jramm Sep 20 '18 at 23:24
  • 1
    Well, your `~*` is not bounded. You could change to `~{0, 100} {0, 100}`, but those won't help narrow anything down, if zero repetitions are a possibility. – CertainPerformance Sep 20 '18 at 23:36

1 Answers1

0

Your question implies that the goal of your code is to match AP_C. If that is the case, lookbehind should not be required. You can just pull out the matched subpattern from your call to str_match.

Without knowing all of the possible formats of your strings, something like this should work for the example you provided:

m = "~.*? (AP_\\w+)"
x = "XY_O ~ R_Z + YPP_L_WINTER + AP_C"
str_match(x, m)[1, 1]

There are probably other regex possibilities depending on your format requirements.

jramm
  • 751
  • 1
  • 8
  • 26