7

Is there way to check is text file (.txt) encoded with Unicode or UTF-8 with Java?

Andrew Thompson
  • 168,117
  • 40
  • 217
  • 433
Zookey
  • 2,637
  • 13
  • 46
  • 80

2 Answers2

11

You cannot know with absolute certainty which charset is used in the general case. I found this to be a good read.

http://illegalargumentexception.blogspot.co.uk/2009/05/java-rough-guide-to-character-encoding.html

Especially the section Automatic detection of encoding.

Paul Grime
  • 14,970
  • 4
  • 36
  • 58
2

Uhm, theoretically, how would you know if it is unicode?

This is the real question. Truthfully, you cannot know, but you can make a decent guess.

See: Java : How to determine the correct charset encoding of a stream for more details. :)

Community
  • 1
  • 1
Haakon Løtveit
  • 1,009
  • 10
  • 18