Iterate through unicode characters dynamically

Question

I am writing an application in Android Studio that can count the occurrences of each letter of a sentence. Ex:

// Input
String sentence = "abbdddd";

// Output
a:1; b:2; c:0; d:4; e:0; f:0; // And so on

However, I also want it to count Amharic characters, so if I put in:

String sentence = "abcሀሁሂ";

It would give me:

a:1; b:1; c:1 ... ሀ:1; ሁ:1; ሂ:1;

At the moment, I have two ArrayLists, cycle and letterCount. Cycle has all the possible characters any letter of the inputted sentence could be. letterCount is the same size as cycle, and at runtime, every value is equal to zero. When you type in a sentence, it looks for any matches in cycle (which, if the letter is english or amharic, it should find). When it finds a match, it goes to letterCount and adds one to the corresponding value. So if the first letter in the sentence is "a", then it goes to the first value of letterCount and adds one. If it is "c", then it goes to the third value of letterCount and adds one. The values inside cycle and letterCount are added dynamically using a for loop:

    for (int i = 97; i < 123; i++) {
        char val = (char)i; // This is where the problem lies...I think
        cycle.add(val);
        letterCount.add(0);
    }

However, doing "(char)i" converts it to an ASCII character, which doesn't include Amharic characters. So is there a way to, instead of looping through ASCII, loop through unicode characters and add them to cycle? Any help would be greatly appreciated.

score 1 · Accepted Answer · edited May 23 '17 at 12:00

1

Unicode in the range U+1200 to U+137F covers Ethiopic as well as Amharic, so it exists in the BMP (Basic Multilingual Plane) and can be represented by a 16 bit value.

doing "(char)i" converts it to an ASCII character [???]

False. Unlike some other languages, a char in Java is 2 bytes large, so that is sufficient for your purposes.

For more information see: Comparing a char to a code-point?

edited May 23 '17 at 12:00

Community

1
1

answered Dec 18 '16 at 15:48

Patrick Parker

4,863
4
19
51

Thanks for your response! However, what I meant was that from what I've searched up, the unicode values for Amharic include letters (ex: \u126B), and I'm not sure how to loop through the letters as well as the numbers. I also don't want to hardcode it into the ArrayLists, because there are a lot of Amharic characters. Should I reword my question? – Mister_Maybe Dec 18 '16 at 16:31
I'm still not sure what you are trying to ask. You need to decide how to express exactly what parts you do not know how to do. For example, do you realize that a Unicode codepoint can be written as an integer with hexadecimal literal notation? `int i = 0x126B` – Patrick Parker Dec 18 '16 at 17:19
I solved it now. What I needed was some way (dosen't have to be unicode) to convert numbers into the corresponding characters. The code I used is this: for (int i = 4608; i < 4954; i++) {char val = (char)i; cycle.add(val); letterCount.add(0);} I didn't want any letters because I wanted to put it in a for loop and iterate through it. Anyways, thanks for your help! :) – Mister_Maybe Dec 18 '16 at 17:50
@Mister_Maybe In case it's not clear, `char` is a UTF-16 code unit, one or two of which encode a Unicode codepoint. So, your references to "doesn't have to be Unicode" and "ASCII character" are confusing. To expand on @Patrick's comment, UTF-16 is used by Java, JavaScript, C#, VB, Windows API, Windows NTFS, and almost always by Linux ext3, ext4. Further, Unicode is used by HTML, CSS, XML, …. Unless you are looking at a spec (e.g., [RFC 7540](https://tools.ietf.org/html/rfc7540#section-8.1.2)), you can assume that references to ASCII are inappropriate. – Tom Blodget Dec 18 '16 at 22:13

Iterate through unicode characters dynamically

1 Answers1