1

I have the following text

">UNWANTEDTEXT">APRODUCT</ProductCode>

I'm looking to build a regex statement with my desired result being the text

APRODUCT

The regex I have at the moment is this.

">(.*?)<\/ProductCode>

The problem I'm facing is that the same text pattern of "> occurs at the start... I need a way of telling the regex to only look at the last occurrence of the "> then pull the value between it and </ProductCode>

Peter H
  • 871
  • 1
  • 10
  • 33
  • You should probably post more of the text if there is more... Otherwise I'd just do something more like `'">UNWANTEDTEXT">APRODUCT'.split("\">")['">UNWANTEDTEXT">APRODUCT'.split("\">").length - 1].split("<")[0]` And also which implementation of regex... (A little about your stack) – Austin T French Nov 22 '17 at 00:46

1 Answers1

2

The easiest solution is to indicate which characters you want to match instead of any character, i.e. any character that's not a closing angle bracket:

([^>]*)<\/ProductCode>

If the string can contain a closing angle bracket if it's not preceded by a quotation mark, the solution gets a little hairier. Assuming your regex library supports zero-width assertions:

(?:">)?((?:(?!">).)*)<\/ProductCode>

Hope this helps!

I also want to add that if you're parsing SGML, you might consider using a library dedicated to that purpose instead of trying to cobble together your own parser based on regular expressions. That path is fraught with peril.

mwp
  • 8,217
  • 20
  • 26