5

I have UTF-8 text stored in DB and served as text/plain; charset=utf-8 in a web application. All the things are working fine. I can see the UTF-8 text on browser window without any problem.

But when I save that text to a file and try to open it in Windows Notepad, I got some characters missing and displayed as a small rectangular box. However, the text file looks fine in other editors like EditPlus and Notepad++.

How is this caused and how can I solve it?

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
JAVAGeek
  • 2,674
  • 8
  • 32
  • 52
  • If I understand correctly the characters are only broken when you open the file with notepad but with all other tools everything is okay. Then I would suspect notepad to be unable to cope with UTF-8, which would not surprise me. In Notepad++ you should be able to see the used encoding. I don't have the English version at hand but it should be the fifth menu from the left and called "Encoding". That should show you what encoding the file is stored it. – Candlejack Jul 10 '12 at 12:07

3 Answers3

3

If it looks fine in other editors, then the text itself is fine. If it looks OK in the browser, then the response is probably fine too (but better check page info in the browser and see what the encoding is). Your problem is probably with notepad itself. Sometimes it requires BOM to detect Unicode properly. But BOM can break other apps that don't support it. You should also try Notepad on different versions of Windows. I have just tried opening an UTF-8 file in Windows 7, looks fine to me.

Sergei Tachenov
  • 24,345
  • 8
  • 57
  • 73
  • i can see the encoding in notepad++ its ANSI .where i want it to be UTF-8 – JAVAGeek Jul 10 '12 at 12:10
  • @JAVAGeek, if it's really ANSI then Notepad shouldn't have any problems with reading it. It means that Notepad++ is wrong, and it's not ANSI. By UTF-8 Notepad++ means "UTF-8 with BOM", which isn't strictly correct, as UTF-8 without BOM is UTF-8 too. To be sure, look at your file using some hex viewer - if symbols outside of 7-bit ASCII are encoded as 2 or more bytes, then it's really UTF-8. – Sergei Tachenov Jul 11 '12 at 10:44
-2

If you use tomcat as an application server you might want to add this to its configuration: "-Dfile.encoding=UTF-8"

Also, take a look here: Setting the default Java character encoding?

Community
  • 1
  • 1
mihaisimi
  • 1,911
  • 13
  • 15
-2

You need to use as below:

response.setContentType("text/html; charset=utf-8"); response.setCharacterEncoding("UTF-8");

Minh
  • 1
  • Please consider adding more explanation to your answer, for example, explaining where OP went wrong or why your solution works – CallumDA Nov 26 '16 at 10:23