how to access unicodes which are stored in database or file in java

Question

i stored some of the words and which are assigned to respective unicodes. for ex:

arasu "\u0C85\u0CA5\u0CB2\u0CC1";
aadalu "\u0C86\u0CA1\u0CB2\u0CC1";

the above words with unicodes are stored in text file. whenever we access the words and unicodes from this text file it has to display words in "kannada" language. first we access the unicode from the text file and we assign that unicode to one string. for example if we accessed the first unicode "\u0C85\u0CA5\u0CB2\u0CC1" from text file by reading it in java program and stored it into string 'str'. now string str is having the unicode "\u0C85\u0CA5\u0CB2\u0CC1". if we pass this string in following code

JFrame frame= new JFrame("TextArea frame");
GraphicsEnvironment.getLocalGraphicsEnvironment();
Font font= new Font("Tunga",Font.PLAIN,40);
JPanel panel=new JPanel();
JTextArea jt=new JTextArea(str);
jt.setFont(font);
frame.add(panel);
panel.add(jt);
frame.setSize(250,200);
frame.setVisible(true);
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

then it has to display the output in kannada language. But now it is not displaying in kannada. it is displaying square boxes. How to resolve this problem. please help us.

Are you sure that the font you're selecting supports the unicode characters you're trying to display? — David Gelhar, May 04 '12 at 15:20

score 0 · Answer 1 · edited May 23 '17 at 12:20

0

The square box means "the selected font doesn't support this glyph" (a glyph is usually a character).

Most fonts don't offer full (= all Unicode code points). One reason is that there are many of them (Unicode 6.1 knows 249,763 code points). Another is that "Unicode" is an evolving standard, and fonts don't magically "update" themselves when new characters come out.

The solution in your case is to use a font which offers more glyphs.

If you're on Windows, try "Arial" - it has pretty decent Unicode support.

A cheap test is to download the HTML page http://www.alanwood.net/unicode/kannada.html and put different font names into this CSS element:

td.b {font-size:1.8em;}

To try Arial, use this code:

td.b {font-size:1.8em; font-family: Arial;}

Replace the font names until you're happy with the result. If the font name contains spaces, you need to quote the name:

td.b {font-size:1.8em; font-family: "Times New Roman";}

[EDIT] jarnbjo just mentioned that most browsers will select a different font for you if the one you want doesn't have unicode support (or doesn't support the page).

So try this instead: Write a small Java program which calls font.canDisplay(int codePoint) where codePoint must be the characters you want. Start with 0x0CC1 since it seems a common character. If the method returns true, the font supports that glyph.

To get all fonts, call panel.getGraphis().getAvailableFontFamilyNames() to get a list of names. Try each one and print those where font.canDisplay( 0x0cc1 ) returns true.

If you want the "best" font, count how many characters between 0x0c82 and 0xcf2 are supported and sort the list of fonts by that number.

edited May 23 '17 at 12:20

Community

1
1

answered May 04 '12 at 15:35

Aaron Digulla

321,842
108
597
820

Using HTML and CSS to check which characters are available in a specific font is not a good idea. Most browsers automatically selects a different font if the characters are not available in the requested font. – jarnbjo May 04 '12 at 15:40
Hm ... that's new to me but it makes sense. Any other simple way to check whether a font has the necessary characters? – Aaron Digulla May 04 '12 at 15:45
With Java, you can check if a character is contained in a font with one of the `java.awt.Font#canDisplay()` methods. – jarnbjo May 04 '12 at 15:50
actually ther is no problem with fonts, actual problem is that, if we stored a unicode to a string directly, if we pass it into a text area then output will be correct. But if read unicode from text file(we read text file using codes "File myFile=newfile("example.txt"); string next_line=newfile.readLine();), result is stored into string and if we pass it in text area then output will be wrong. – user1340371 May 05 '12 at 04:57
Ah.. that's the usual encoding bug. See this answer: http://stackoverflow.com/a/2049472/34088 – Aaron Digulla May 05 '12 at 23:16
thanks, [ File f = new File(....filepath...); RandomAccessFile access=new RandomAccessFile(f,"r"); access.seek(0); ] this code is used to read (access) file randomly, so In this case how to use encoding types. – user1340371 May 07 '12 at 06:48
Use Java 6 and NIO to create a charbuffer to read from the file. – Aaron Digulla May 07 '12 at 08:06

score 0 · Answer 2 · answered May 04 '12 at 15:47

0

It looks ok here:

enter image description here

Are you sure that str contains what you expect and that there are no character encoding issues when e.g. reading and writing text files?

answered May 04 '12 at 15:47

jarnbjo

33,923
7
70
94

yes. output looks like as you given above here. string str contains the unicodes. First we read a text file (in this text file we already stored some of unicodes). after reading file we will store the result to a string str. then we will pass that unicode to text area using jframe. but the output shows square boxes rather than output given by you. – user1340371 May 05 '12 at 04:22

score 0 · Answer 3 · answered May 04 '12 at 15:51

It's possible that your Java environment does not include a font named "Tunga", and that you're instead getting some kind of default font (without support for the glyphs you need).

After you do

Font font= new Font("Tunga",Font.PLAIN,40);

you'll have some font, but not necessarily exactly what you asked for. You might try logging the results to see where you are:

System.out.println("Font = "+font);

For example, on my machine, it logs:

Font = java.awt.Font[family=Dialog,name=Tunga,style=plain,size=40]

-- it's falling back to the "Dialog" font family because I don't have "Tunga".

You can look at

GraphicsEnvironment.getLocalGraphicsEnvironment().getAvailableFontFamilyNames();

to see what fonts are available to you.

after reading unicode the result is stored to string str then it is passed to textarea, then output will be wrong. — user1340371, May 05 '12 at 04:43

how to access unicodes which are stored in database or file in java

3 Answers3