-1

I have last names in an XML file that I would like to capture, which are unique. I am going off this other StackOverflow answer to start: Only match unique string occurrences I am not able to match the strings that I have with this to return one Adams and one Yellow.

\b(.*<LastName>(.*)<\/LastName>)\b(?![\s\S]*\b\1\b)

              <LastName>Adams</LastName>
              <LastName>Adams</LastName>
              <LastName>Yellow</LastName>

https://regex101.com/r/2wLsm5/1

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
imparante
  • 503
  • 9
  • 21
  • What tool are you using, and why are you not using an XML parser? Or even just pipe it to `sort -u`. – miken32 May 13 '22 at 21:06
  • Does this answer your question? [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Ryszard Czech May 13 '22 at 22:08

2 Answers2

1

Does this work for you?

/<LastName>(\w+)<\/LastName>(?!.*<LastName>\1<\/LastName>)/gsm (note the flags, they're important)

Demo

The issue was that your (.*) to match the name allowed it to match across multiple lines. I replaced it with \w+ so it only matches word characters (depending on your needs something a little more international might be needed, though).

isaactfa
  • 5,461
  • 1
  • 10
  • 24
  • Also of note: The name will be in capture group 1. The whole match will include the tags. This could be changed with look-arounds. – isaactfa May 13 '22 at 20:30
  • Ah, okay. I didn't know that was searching multiple lines. Thank you very much for explaining that! – imparante May 14 '22 at 00:10
0

You can capture the name of the tag and it's content.
Then use the backreferences in the negative lookahead.

A lazy search .*? for the tag's content helps here.

<(LastName)>(.*?)<\/\1>(?![\s\S]*?<\1>\2<\/\1>)

Test on regex101 here

LukStorms
  • 28,916
  • 5
  • 31
  • 45