0

Why am I not getting the same behavior when I build a Pattern from a literal regular expression and when I read the regular expression from a file?

String regex = "(?xi)(title)[\\.:;](.*) \043 Title property";
Pattern pattern = Pattern.compile(regex);
System.out.println(pattern);
// will print:

(?xi)(title)[.:;](.*) # Title property

This expression works yet when I attempt to read the regular expression from a file I encounter an issue. The '\043' isn't replaced to its proper form. Why so?

I'm trying to avoid the use of the literal '#' character symbol and thereby use and alternative representation of it because of other conflicts in my code.

VLAZ
  • 26,331
  • 9
  • 49
  • 67
  • The file contains `\043` literally? In source code, that's replaced by the compiler. You get no such behavior when you read it from wherever at runtime. – Sotirios Delimanolis Aug 30 '16 at 14:35
  • Why do you want to avoid #? – J Fabian Meier Aug 30 '16 at 14:36
  • Then use `\u0023` if you do not want to use `#`. Please provide an [MCVE (minimal complete verifiable example)](http://stackoverflow.com/help/mcve). I suspect you just do not have `#` in your input. – Wiktor Stribiżew Aug 30 '16 at 14:50
  • I had the wrong duplicate and I can't find a more relevant one. What you want to do is parse an _octal escape_ sequence from a file. – Sotirios Delimanolis Aug 30 '16 at 14:55
  • @SotiriosDelimanolis: http://stackoverflow.com/questions/3537706/how-to-unescape-a-java-string-literal-in-java? – Wiktor Stribiżew Aug 30 '16 at 15:13
  • @WiktorStribiżew It's hidden somewhere in there. Up to you :) – Sotirios Delimanolis Aug 30 '16 at 15:14
  • the main idea is to build a small scanner, and to use a file representation like the Trivial Graph Format, wish is very basic an easy to understand, the problem is that '#' is use as a divider in the file specification to separate data from nodes and edges. So the use for representations like \u0023 en Unicode and \x043 en octal for that symbol. – alejandro romero Aug 30 '16 at 17:19

1 Answers1

0
 assertEquals(1, "\043".length());
 assertEquals("#", "\043");

... both pass.

That \043 gets turned into one character ("#") by the Java compiler.

If you read a file containing:

\043

... into a String, then:

 assertEquals("\\043", stringFromFile);

... will pass. If you want a literal \ in your string, you need to escape it with another \.

slim
  • 40,215
  • 13
  • 94
  • 127