0

Whenever I enter the following...

Pattern pmessage = Pattern.compile("\s*\p{Alnum}[\p{Alnum}\s]*");
Matcher mmessage = pmessage.matcher(message);
Matcher msubject = pmessage.matcher(subject);

I get a Invalid Escape Sequence error. Anyone have any idea why / how I fix this?

Skizit
  • 43,506
  • 91
  • 209
  • 269
  • Be warned that even corrected for ddoouubbllee bbaacckkssllllaasshheess, that doesn’t work with Java native characters, only with ASCII. – tchrist Dec 03 '10 at 12:45
  • Shouldn't that be `bbaacckkssllaasshheess` instead of `bbaacckkssllllaasshheess`? :) – Bart Kiers Dec 03 '10 at 13:00

4 Answers4

2

For a version of \p{Alpha} that works on the Java native character set instead being stuck unsable to process anything else than legacy data from the 1960s, you need to use

alphabetics = "[\\pL\\pM\\p{Nl]";

For a version of numerics in the same sense, you have to choose which of these you want:

ASCII_digits    = "[0-9]";
all_numbers     = "\\pN";
decimal_numbers = "\\p{Nd}"

because which one applies various depending on circumstances. We’ll assume you copied one of those three to a numeric variable.

Assuming you then want alphanumerics based on the definition above, you could then write:

 alphanumerics = "[" + alphabetics + numerics + "]";

However, if what you mean by alphanumerics is the \w sense of program identifiers, you have to add some stuff.

 identifier_chars = "[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}[\\p{InEnclosedAlphanumerics}&&\\p{So}]]";

This issue is discussed at length in this answer, where you’ll also find a link to some alpha code of mine that does these transforms for you automatically. I hope to get a chance to rewrite it to take up less space this weekend.

Community
  • 1
  • 1
tchrist
  • 78,834
  • 30
  • 123
  • 180
1

Double each backslash: Pattern.compile("\\s*\\p{Alnum}[\\p{Alnum}\\s]*")

Backslashes inside string literals have a special meaning, and have to be duplicated in order for the actual backslash character to become part of the string (which is what is required in your regex example.)

NPE
  • 486,780
  • 108
  • 951
  • 1,012
1

Keep in mind, that backslashes are special characters in Java strings, that need to be escaped with an additional backslash:

Pattern.compile("\\s*\\p{Alnum}[\\p{Alnum}\\s]*");
RoToRa
  • 37,635
  • 12
  • 69
  • 105
1

You didn't correctly escape your "\" characters : in java, "\s" will give you \s, so you should write :

Pattern.compile("\\s*\\p{Alnum}[\\p{Alnum}\\s]*");
Mikarnage
  • 893
  • 1
  • 9
  • 24