How to detect if a string does not contains other languages letters other than English letters?

Question

Consider a line like:

[Hello簲  bye 簲 ]

This line has both Chinese and English letters which is not of my interests. So I want to find out that if a string does not have any languages' letters other than English. Any idea?

EDIT I do not want to solve it with regex. Otherwise I would have tagged it!

Check this question http://stackoverflow.com/q/5238491/4517895 — Mikhail Chibel, Jul 03 '16 at 10:12
@MikhailChibel How to edit the answer of given link, so that the function accepts if the string contains digits and special characters? (like () <> ? ! etc) — lonesome, Jul 03 '16 at 11:18

score 0 · Answer 1 · answered Jul 03 '16 at 06:54

0

https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html

In class char, there is this:

getNumericValue(char ch) Returns the int value that the specified Unicode character represents.

I believe you can do little more research to find unicode value of English letters so that you may check value of char is in range of English characters.

answered Jul 03 '16 at 06:54

Jae Min Im

119
9

Of course, you have to break String to chars, but there could be easier way than breaking it up to individual chars. – Jae Min Im Jul 03 '16 at 06:55

score 0 · Answer 2 · answered Jul 03 '16 at 11:13

0

If you don't want to use regexp, you can use below

    String str = "Hello簲  bye 簲";
    boolean isValid = true;
    for (char c : str.toCharArray()) {
        if (!(c >= 'a' && c <= 'z') && !(c >= 'A' && c <= 'Z')) {
            isValid = false;
            break;
        }
    }
    System.out.println(isValid);

answered Jul 03 '16 at 11:13

Saravana

12,647
2
39
57

How to edit this answer so that the function accepts if the string contains digits and special characters? (like () <> ? ! etc) – lonesome Jul 03 '16 at 11:19
do you want to check all ASCII characters [0-127]? – Saravana Jul 03 '16 at 11:24
yea, something like that. oh wait, only until 127 – lonesome Jul 03 '16 at 11:25
yes, need to check `(int)c < 128` – Saravana Jul 03 '16 at 11:30
check this http://stackoverflow.com/questions/3585053/in-java-is-it-possible-to-check-if-a-string-is-only-ascii – Saravana Jul 03 '16 at 11:33
at same if clause? – lonesome Jul 03 '16 at 11:51
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/116296/discussion-between-saravana-and-lonesome). – Saravana Jul 03 '16 at 12:59
i want like, sting can contain anything like alphabets, numbers, special characters but not the other language characters. how do we do that? – Happy Oct 04 '16 at 12:45

progyammer · Answer 3 · 2016-07-09T04:03:28.180

You can make use of ASCII values of all English characters in this program - digits, upper case and lower case alphabets (and also, blank spaces must be checked).

The logic: Iterate through each character of the String and check if the current character is an English character, i.e., its ASCII value lies between 48 and 57 (for numbers 0 - 9), 65 and 90 (for upper case alphabets) or 97 and 122 (for lower case alphabets) or is a blank space. If it's not any of these, then it's a non English character.

Here's the code:

String s = <-- THE STRING
int illegal = 0; //to count no. of non english characters
for(int i=0; i< s.length(); i++){
    int c = (int)s.charAt(i);
    if(!((c>=48 && c<=57)||(c>=65 && c<=90)||(c>=97 && c<=122)||((char)c == ' ')))
        illegal++;
}
if(illegal > 0)
System.out.print("String contains non english characters");
else
System.out.print("String does not contain non english characters");

NOTE: Make sure that the platform you're running the program on supports these characters. The character encoding for Chinese is either Unicode (Unicode supports almost all languages of the world) or UTF-16. Make sure to use this or even the UTF-32 encoding while running the program and that the platform supports UTF-16/32 if not Unicode.

I tested this code on my computer with the following test data:

String s = "abcdEFGH 745401 妈妈";

and I got the correct output as I ran this on Unicode. On platforms not supporting Unicode or UTF-16/32, the compiler treats the Chinese letters 妈妈 as ?????? and it may produce an error in the program. The Chinese letters, which become ?????? for the system will simply be ignored during execution and therefore the output of the above input I tested with would be String does not contain non English characters which is wrong. So in case you're running the program on an online Terminal/IDE or on a mobile phone, make sure to take care of this factor. You don't need to worry if you are running it on a windows/mac computer.

I hope it helps you.

How to detect if a string does not contains other languages letters other than English letters?

3 Answers3