What I want do:
Using Java, I want to match a RegEx pattern, unless the match is immediately followed by a "poison" suffix.
Exemples:
"legitString" RETURNS "legitString"
"legitString blabla" RETURNS "legitString"
"legitString PoisonousSuffix" RETURNS "legitString"
"legitStringPoisonousSuffix" RETURNS no match
My use case:
I need to parse as much references from a file as I can, following a particular pattern. But some lines of the file are truncated, and not always at the same length(!).
Luckily, when this happens, the line ends with ">>". I have to assume the reference is truncated and I have to discard it. So ">>$" would be the poisonous suffix in my case. On the other hand, if ">>" is in the middle of the text, I should safely extract the reference as I would normally do. (The reference ends with digits, but the number of digit can be different each time so I can't use that.)
So in my case:
"REF" RETURNS "REF"
"REF >>" RETURNS "REF"
"REF>>" RETURNS nothing
"REF>> bla " RETURNS "REF" // because in my case, the poison is only poisonous if in the end
I've seen: https://stackoverflow.com/tags/regex/info But I tried the syntax
myRegex(?!>>$)
and it looks wrong. It truncates the last legit digit of the reference when the line ends with ">>", which is the worst scenario: a corrupted reference going through.
I've seen: Regex for string not ending with given suffix but :
myRegex(?:(?!>>).).$
rejects legitimate references.
My exact regex (without poison) :
\b(SWN-)?WZ-SB\d{2}(-\d{2}){2}-[A-Z]?\d*
should return SWN-WZ-SB00-49-03-C11 for:
"SWN-WZ-SB00-49-03-C11>> bla"
"SWN-WZ-SB00-49-03-C11 >> "
"SWN-WZ-SB00-49-03-C11 >>"
"SWN-WZ-SB00-49-03-C11 >> bla"
and nothing for:
"SWN-WZ-SB00-49-03-C11>>"
Bonus
Is there a way to generalize and have function taking regexPattern and poisonousSuffix and returning a safeRegexPattern?
Thanks