1

I am writing a regex to support alphabets in both lower and upper case, digits, - and Unicode characters within the range 00C0-00FF.

I have seen answers explaining supporting all language characters using regex \p{L}+ but I don't want to support all language characters. I only want to support a specific range [00C0-00FF] of Unicode characters from URL https://unicode-table.com/en/blocks/latin-1-supplement/

I tested my example string O’Donnell À Ö ö Ì ÿ 012 on website https://regex101.com/ with pattern [A-Za-z0-9\x{00C0}-\x{00FF}'’\- ]{1,70} but this pattern [A-Za-z0-9\x{00C0}-\x{00FF}'’\- ]{1,70} doesn't work in java. May you support me for writing equivalent pattern for Java.

Sample Code I am using to test regex -

public static void main(String... args) {
        Pattern p = Pattern.compile("[A-Za-z0-9\\x{00C0}-\\x{00FF}'’\\- ]{1,70}",
                                    Pattern.UNICODE_CHARACTER_CLASS);
        Matcher m = p.matcher("O’Donnell À Ö ö Ì ÿ 012");
        boolean b = m.matches();
        System.out.println("value=" + b);
    }
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Bagesh Sharma
  • 783
  • 2
  • 9
  • 23
  • Also relevant: https://stackoverflow.com/questions/10664434/escaping-special-characters-in-java-regular-expressions – Ani Feb 17 '21 at 17:52

2 Answers2

1

Use \\u instead of \x and remove the curly brackets and add escape sequences in your regex so it becomes:

"[A-Za-z0-9\\u00C0-\\u00FF'’\\- ]{1,70}"
Ani
  • 532
  • 3
  • 13
1

Although the answer posted above is working fine but it may fail on windows machine due to windows file editor encoding issue. For Unicode characters, UTF-8 encoding should be used to save files. It's also good to use Unicode values of special characters in strings as explained in the below example.

import java.util.regex.Pattern;

public class Main {

    public static void main(String[] args) {
        String str = "O'Donnell \u00C0 \u00D6 \u00F6 \u00CC \u00FF 012"; // Unicode value of string 'O’Donnell À Ö ö Ì ÿ 012'
        System.out.println(Pattern.matches("[A-Za-z0-9\\u00C0-\\u00FF'’\\- ]{1,70}", str));
    }
}
Bagesh Sharma
  • 783
  • 2
  • 9
  • 23