4

I have the following method to check that string contains only latin symbols.

private boolean containsNonLatin(String val) {
        return val.matches("\\w+");
}

But it returns false if I pass string: my string because it contains space. But I need the method which will check that if string contains letters not in Latin alphabet it should return false and it should return true in all other cases.

Please help to improve my method.

examples of valid strings:

w123.
w, 12
w#123
dsf%&@
gstackoverflow
  • 36,709
  • 117
  • 359
  • 710
  • If you need to only match ASCII letters and a space, use `return val.matches("[\\p{Alpha} ]+");` – Wiktor Stribiżew Feb 09 '16 at 08:10
  • 1
    @Wiktor Stribiżew I neeed return false only if I see symbols of another alpabets. Chinese for example – gstackoverflow Feb 09 '16 at 08:14
  • `"[\\p{Alpha} ]+"` does not allow Chinese, only ASCII letters and a regular (32 dec. value) space. You can also use `[\\p{L}\\p{M}&&[^\\p{Alpha}]]+` to match one or more any letters but ASCII ones. – Wiktor Stribiżew Feb 09 '16 at 08:14
  • 1
    `%&@` are not Latin characters. If you want to allow those characters, you'll need a better-defined rule. – shmosel Feb 09 '16 at 08:21
  • @Wiktor Stribiżew this doesn't work for string **213** – gstackoverflow Feb 09 '16 at 08:57
  • It returns [true](http://ideone.com/OVfEHG) because it does not contain a letter other than a Latin one. What do you expect? – Wiktor Stribiżew Feb 09 '16 at 09:03
  • Please check my answer. I explained what the regex does: it *checks that string contains non-latin letters*. And as far as I understand, if these letters (like `ы`, `ę`) are found, you want to return false. Right? See my fiddles, and feel free to fork them to show what behavior you need. Otherwise, you should re-consider your requirements. – Wiktor Stribiżew Feb 09 '16 at 09:31

4 Answers4

8

You can use \p{IsLatin} class:

return !(var.matches("[\\p{Punct}\\p{Space}\\p{IsLatin}]+$"));

Java Regex Reference

anubhava
  • 761,203
  • 64
  • 569
  • 643
4

I need something like not p{IsLatin}

If you need to match all letters but Latin ASCII letters, you can use

"[\\p{L}\\p{M}&&[^\\p{Alpha}]]+"

The \p{Alpha} POSIX class matches [A-Za-z]. The \p{L} matches any Unicode base letter, \p{M} matches diacritics. When we add &&[^\p{Alpha}] we subtract these [A-Za-z] from all the Unicode letters.

The whole expression means match one or more Unicode letters other than ASCII letters.

To add a space, just add \s:

"[\\s\\p{L}\\p{M}&&[^\\p{Alpha}]]+"

See IDEONE demo:

List<String> strs = Arrays.asList("w123.", "w, 12", "w#123", "dsf%&@", "Двв");
for (String str : strs)
    System.out.println(!str.matches("[\\s\\p{L}\\p{M}&&[^\\p{Alpha}]]+")); // => 4 true, 1 false
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

Just add a space to your matcher:

private boolean isLatin(String val) {
    return val.matches("[ \\w]+");
}
shmosel
  • 49,289
  • 6
  • 73
  • 138
0

User this :

public static boolean isNoAlphaNumeric(String s) {
       return s.matches("[\\p{L}\\s]+");
}
  • \p{L} means any Unicode letter.
  • \s space character
Bhuwan Prasad Upadhyay
  • 2,916
  • 1
  • 29
  • 33