I am working on a project that deals with foreign languages. I have this String
on Java:
String string = "áçñéüöëéóíóíóíóíííããéíáéíáççãÓłńńāņšøøøøééèèÜüÜüééíéáéáříříççááññïïššäääééèèááéáéáéáéáéáéáèèèèííéèéèáééÇÇééééííüüüüííøøáááá¿¿ííóé̌Íá̌íáææööíÁíÁíííłççññá璇üşİüşİöğöğşşııããßßôèôèêééççáÁáÁééééééÇóóéíêööééííððññáñáñÓúÓúíłńłńååéééëëááéí¿¿ééÖÖáéáéöğÖüöğÖüçŞçŞııçııçııİİşİşíáíáéüüÉÉéééøññïíéé";
and I have saved my java file in utf-8 encoding.
I want to remove duplicated character, then sort characters by their unicodes, and finally print out the result string and save the string into a text file (in UTF-8 or other unicode).
I don't know if it is because of the terminal - I am working on Eclipse (Windows) and I see '?'(question mark) when printing some of the characters. What is the correct way to print the string?
I am also not sure how to SAFELY remove duplicated characters and sort the characters. For example, if I use String.charAt()
and HashSet<Character>
, is it safe to do so in my case? Will I get half a character for some multi-byte character? What is a safe way to compare these characters?
Knowing that the project may deal with a very large variety of different languages, what is a safe way to save the string into text file?
Update: To reproduce the question mark problem:
String str = "¿æŁéİüłïąņąø";
System.out.println(str);
It prints out this on my Eclipse console:
¿æ?é?ü?ï???ø
Note: I am already using GNU FreeMono for the console font, which has very good foreign character cover.