0

I am trying to match Regex for certain Japanese characters blocks based on this post using the String.matches( String regex ) method in the class String.

But both the range regex [\\x3041-\\x3096] and the property regex \p{Hiragana} throw a PatternSyntaxException. My IDE also recommends properties, but none of them Japanese characters seem to be recommended.

The code that throws this error is:

c.matches( "[\x3041-\x3096]" )

The StackTrace is:

[\x3041-\x3096]
           ^
    at java.base/java.util.regex.Pattern.error(Pattern.java:2015)
    at java.base/java.util.regex.Pattern.range(Pattern.java:2813)
    at java.base/java.util.regex.Pattern.clazz(Pattern.java:2701)
    at java.base/java.util.regex.Pattern.sequence(Pattern.java:2126)
    at java.base/java.util.regex.Pattern.expr(Pattern.java:2056)
    at java.base/java.util.regex.Pattern.compile(Pattern.java:1778)
    at java.base/java.util.regex.Pattern.<init>(Pattern.java:1427)
    at java.base/java.util.regex.Pattern.compile(Pattern.java:1068)
    at java.base/java.util.regex.Pattern.matches(Pattern.java:1173)
    at java.base/java.lang.String.matches(String.java:2024)
    at lib.UIE.TextInput.valid(TextInput.java:49)
user10385242
  • 407
  • 1
  • 3
  • 10
  • 1
    Edit your question and include the exact Java source code that invokes `String.matches`. `[\x3041-\x3096]` it is not doing what you think; I suggest searching for `\x` in [the documentation](https://docs.oracle.com/en/java/javase/12/docs/api/java.base/java/util/regex/Pattern.html). – VGR Jul 26 '19 at 13:11
  • Are you sure that’s your code? `"[\x3041-\x3096]"` will not compile in Java. – VGR Jul 26 '19 at 13:35
  • @VGR thanks for noticing. edited. – user10385242 Jul 26 '19 at 14:29

1 Answers1

1

For property regex try using \p{IsHiragana} instead. The Is prefix is used to distinguish unicode scripts and categories from blocks which use In prefix.

UPDATE For \x3041 as @VGR mentioned \x in the original post had nothing to do with java and \u3041 should be used instead.

    Pattern.matches("\\p{IsHiragana}", "ど"); //true
    Pattern.matches("[\u3041-\u3096]", "ど"); //true

Unicode support - Oracle Java Tutorials

  • 1
    Under no circumstances should a character range have a space around the hyphen. And `\\x3041` does not represent codepoint U+3041, it represents the three ASCII characters `0`, `4`, and `1`. (`\\x30` is U+0030, which is ASCII digit zero.) All of this is clearly explained in [the documentation for Pattern](https://docs.oracle.com/en/java/javase/12/docs/api/java.base/java/util/regex/Pattern.html). – VGR Jul 26 '19 at 13:48
  • That's right, thank you! I didn't notice mistake during testing. As for the unicode for Hiragana should we use \u3041 instead? – Tomasz Pieczkowski Jul 26 '19 at 14:19
  • Yes. But, as I said, there *must not* be a space before or after the `-`. – VGR Jul 26 '19 at 14:30
  • I copied from the [original post](https://stackoverflow.com/questions/19899554/unicode-range-for-japanese/53807563#53807563), perhaps they were incorrect with the \x? – user10385242 Jul 26 '19 at 14:31
  • @user10385242 The original post has nothing to do with Java and nothing to do with Java regular expressions. `\x3400` is simply how that answer chose to denote a Unicode codepoint value. – VGR Jul 26 '19 at 14:32