3

So I'm working with last.fm API. Sometimes, the query results in tracks that contain characters like these:

Æther, é, Hṛṣṭa

or non-English characters like these:

水鏡.

When debugging in Eclipse, I see them just fine (as-is) but printing on console prints these as ??? - which is OK for me.

Now, how do I handle these? At first I though I could remove every song that has any character other than the ones in English language. I used the regex ^\\w+$ but it didn't work. I also tried \\w+. That didn't work either.

Then I thought further on how do handle these properly. Any one can help me out? I am perfectly fine with letting these tracks out of the equation, ie. I'm fine with having only English character tracks.

Another question: What is the best way to display these character of console and/or Swing GUI?

Bobulous
  • 12,967
  • 4
  • 37
  • 68
Karan Goel
  • 1,117
  • 1
  • 12
  • 24

1 Answers1

0

You must ensure that you use correct encoding when reading your input first.

Second ensure that the font used in Eclipse on platform you developing has ability to display all these characters. Swing must display unicode chars if you read them correctly.

You will likely want to use UTF-8 everywhere.

Vladimir
  • 4,782
  • 7
  • 35
  • 56
  • 1
    So if I have the correct encoding on my development platform, is it guaranteed that the program will work fine on other platforms as well? – Karan Goel Mar 27 '13 at 20:56
  • If you give your correct app to another user he will need to use correct font as well on his platform in console. E.g. on Linux some default fonts could be not Unicode. – Vladimir Mar 27 '13 at 21:01
  • so basically there are three issues with encoding: 1. how you read your input. 2. how you write output and 3. fonts which display output :-) – Vladimir Mar 27 '13 at 21:02