Regex match similar pattern string but I need the last occurrence

Question

I have the following text

">UNWANTEDTEXT">APRODUCT</ProductCode>

I'm looking to build a regex statement with my desired result being the text

APRODUCT

The regex I have at the moment is this.

">(.*?)<\/ProductCode>

The problem I'm facing is that the same text pattern of "> occurs at the start... I need a way of telling the regex to only look at the last occurrence of the "> then pull the value between it and </ProductCode>

You should probably post more of the text if there is more... Otherwise I'd just do something more like `'">UNWANTEDTEXT">APRODUCT'.split("\">")['">UNWANTEDTEXT">APRODUCT'.split("\">").length - 1].split("<")[0]` And also which implementation of regex... (A little about your stack) — Austin T French, Nov 22 '17 at 00:46

score 2 · Accepted Answer · answered Nov 22 '17 at 01:07

The easiest solution is to indicate which characters you want to match instead of any character, i.e. any character that's not a closing angle bracket:

([^>]*)<\/ProductCode>

If the string can contain a closing angle bracket if it's not preceded by a quotation mark, the solution gets a little hairier. Assuming your regex library supports zero-width assertions:

(?:">)?((?:(?!">).)*)<\/ProductCode>

Hope this helps!

I also want to add that if you're parsing SGML, you might consider using a library dedicated to that purpose instead of trying to cobble together your own parser based on regular expressions. That path is fraught with peril.

Looks good.. First and easiest solutions fits my scenario nicely. Thanks — Peter H, Nov 22 '17 at 01:19

Regex match similar pattern string but I need the last occurrence

1 Answers1