-1

I have a very long regular expression that seems to be having issues, but only when imported from a text file. I've narrowed it down to the following section (shown here as a literal String):

"(?i)(?<!\\w)\\w{2,3}(?=\\))"

As you can see, near the end, I am trying to escape a closing parenthesis for a lookahead. Now, if this is hard-coded, like:

Pattern myPattern = Pattern.compile("(?i)(?<!\\w)\\w{2,3}(?=\\))");

It works completely as expected. If, however, I read it from a text file, like:

File patternFile = new File("patterns.txt");
List<String> patternText = FileUtils.readLines(patternFile);
String ucText = patternText.get(0).trim();
Pattern myPattern = Pattern.compile(ucText);

Then I get the error message:

Exception in thread "Thread-4" java.util.regex.PatternSyntaxException: Unmatched closing ')' near index 25
(?i)(?<!\\w)\\w{2,3}(?=\\))
                         ^

So, why is this happening? Why is escaping a closing parenthesis legal when hard-coded, but not when reading from a text file?

Sturm
  • 689
  • 2
  • 23
  • 52
  • `only when imported from a text file` You have to print that to the console. If it prints out `(?i)(?<!\w)\w{2,3}(?=\))` its ok, if it prints out with it double escaped, you have to unescape those. –  May 27 '15 at 21:19
  • Only use \\ for regex defined in string, otherwise use single \ – MaxZoom May 27 '15 at 21:22

2 Answers2

4

You're writing a Java string literal. \) is not a legal escape code for Java string literals.

You need to escape every backslash with \\ to create a string with a single backslash for the regex.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • Sorry, I had the wrong code copied & pasted. I've corrected it. I assume you were referring to the "hard-coded" example that actually works fine. – Sturm May 27 '15 at 21:14
  • 2
    @Sturm: Now you have too many slashes. Your string has an escaped backslash, not an escaped `)`. Your text file is _not_ a Java string literal. – SLaks May 27 '15 at 21:31
  • So, as a String literal, double-escaping is necessary, but if a String is stored in a variable, then it should not be double-escaped? – Sturm May 28 '15 at 01:58
  • @Sturm - I think in languages like C/C++, relative to a string (double quoted) in the source code - Anything escaped is checked for being a valid escape char ( like "\n" ) where the replacement is made. If not valid it is just un-escaped. So `"(?i)(?<!\w)\w{2,3}(?=\))"` in source code, becomes `(?i)(?<!w)w{2,3}(?=))` in memory.. which would throw when used as a regex constructor. Start with the raw regex, escape the escapes and put double quotes around it, put in src code. –  May 28 '15 at 15:20
  • 1
    Okay, I think I've got my head wrapped around it now. Only when using a String **literal** to compile a RegEx Pattern do I need to double-escape. When getting the string from anything else (i.e., *not* declared as a String literal), do **not** double-escape. It works now that I just have `(?<!\w)\w{2,3}(?=\))` in the text file. Thanks, @SLaks! – Sturm May 28 '15 at 20:40
0

only when imported from a text file

You have to print that to the console.
If it prints out (?i)(?<!\w)\w{2,3}(?=\)) its ok,
if it prints out with it double escaped, you have to un-escape those

A good way to un-escape the escape character is do a global find/replace
(this is %90 of the parsing)

Find "(?x)\\\\ \\\\"
Replace "\\\\"

Un-escape non-escapes is a relative approach.
And it depends upon the character and the substitution,
or no action on either. This is mostly language specific,
but you can roll your own. For this, the basic's are ...

Find "(?xs)\\\\ (.)"
Replace roll your own"