i'm looking for a way to check whether a multiline string (from a pdf) contains a certain letter combination which must not start with a specific prefix. Specifically, i'm trying to find Strings that contain ARC
but don't contain NON-ARC
.
I found this great example Regular expression for a string that does not start with a sequence but it seems it does not work with my problem. With my pattern ^(?!NON\\-)ARC.*
i get the expected result in a single line test, with real input the negative look ahead assertion has a false positive. Here is what i did:
@Test
public void testRegexLookAhead() {
String strTestSimplePos = "ARC 0.1-1";
String strTestSimpleNeg = "NON-ARC 3.4-1";
String strTestRealPos = "HEADLINE\r\n" + "Subheader Author\r\n" + "ARC 0.1-1\r\n" + "20190211";
String strTestRealNeg = "HEADLINE\r\n" + "Subheader Author\r\n" + "NON-ARC 0.1-1\r\n" + "20190211";
//based on https://stackoverflow.com/questions/899422/regular-expression-for-a-string-that-does-not-start-with-a-sequence
String regexNoNON = "^(?!NON\\-)ARC.*";
Pattern noNONPatter = Pattern.compile(regexNoNON);
System.out.println(noNONPatter.matcher(strTestSimplePos).find()); //true OK
System.out.println(noNONPatter.matcher(strTestSimpleNeg).find()); //false OK
System.out.println(noNONPatter.matcher(strTestRealPos).find()); //false but should be true -> does not work as intended
System.out.println(noNONPatter.matcher(strTestRealNeg).find()); //false OK
Would be glad if anyone can point out what went wrong...
Edit: This was marked as a duplicate of How to use java regex to match a line - however i didn't try to use a regex to match a line at all. Just needed a way to find a specific sequence (with negative look-ahead) for a multiline text input. One approach to solve the other question is also the solution to this one (compile pattern with java.util.regex.Pattern.MULTILINE) - but the questions are at best related.