1

Consider a line like:

[Hello簲  bye 簲 ]

This line has both Chinese and English letters which is not of my interests. So I want to find out that if a string does not have any languages' letters other than English. Any idea?

EDIT I do not want to solve it with regex. Otherwise I would have tagged it!

lonesome
  • 2,503
  • 6
  • 35
  • 61

3 Answers3

0

https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html

In class char, there is this:

getNumericValue(char ch) Returns the int value that the specified Unicode character represents.

I believe you can do little more research to find unicode value of English letters so that you may check value of char is in range of English characters.

Jae Min Im
  • 119
  • 9
  • Of course, you have to break String to chars, but there could be easier way than breaking it up to individual chars. – Jae Min Im Jul 03 '16 at 06:55
0

If you don't want to use regexp, you can use below

    String str = "Hello簲  bye 簲";
    boolean isValid = true;
    for (char c : str.toCharArray()) {
        if (!(c >= 'a' && c <= 'z') && !(c >= 'A' && c <= 'Z')) {
            isValid = false;
            break;
        }
    }
    System.out.println(isValid);
Saravana
  • 12,647
  • 2
  • 39
  • 57
0

You can make use of ASCII values of all English characters in this program - digits, upper case and lower case alphabets (and also, blank spaces must be checked).

The logic: Iterate through each character of the String and check if the current character is an English character, i.e., its ASCII value lies between 48 and 57 (for numbers 0 - 9), 65 and 90 (for upper case alphabets) or 97 and 122 (for lower case alphabets) or is a blank space. If it's not any of these, then it's a non English character.

Here's the code:

String s = <-- THE STRING
int illegal = 0; //to count no. of non english characters
for(int i=0; i< s.length(); i++){
    int c = (int)s.charAt(i);
    if(!((c>=48 && c<=57)||(c>=65 && c<=90)||(c>=97 && c<=122)||((char)c == ' ')))
        illegal++;
}
if(illegal > 0)
System.out.print("String contains non english characters");
else
System.out.print("String does not contain non english characters");

NOTE: Make sure that the platform you're running the program on supports these characters. The character encoding for Chinese is either Unicode (Unicode supports almost all languages of the world) or UTF-16. Make sure to use this or even the UTF-32 encoding while running the program and that the platform supports UTF-16/32 if not Unicode.

I tested this code on my computer with the following test data:

String s = "abcdEFGH 745401 妈妈";

and I got the correct output as I ran this on Unicode. On platforms not supporting Unicode or UTF-16/32, the compiler treats the Chinese letters 妈妈 as ?????? and it may produce an error in the program. The Chinese letters, which become ?????? for the system will simply be ignored during execution and therefore the output of the above input I tested with would be String does not contain non English characters which is wrong. So in case you're running the program on an online Terminal/IDE or on a mobile phone, make sure to take care of this factor. You don't need to worry if you are running it on a windows/mac computer.

I hope it helps you.

progyammer
  • 1,498
  • 3
  • 17
  • 29