2

I want to read several text files (eg CSV), but I don't know the encoding.

As the textfiles may contain special chars like umlauts, chosing the right encoding seems to be crucial.

new BufferedReader(new InputStreamReader(resource.getInputStream(), encoding));

I tried reading with ISO_8859_1 which did not work propertly with umlauts encoded. So I tried UTF-8, which works.

But I don't know in future if this might also cause problems with different files. And I never now before reading a file in which encoding the file is.

So how should I best read files with encoding unknown?

membersound
  • 81,582
  • 193
  • 585
  • 1,120
  • possible duplicate of [Java : How to determine the correct charset encoding of a stream](http://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream) – Florent Bayle Jan 20 '15 at 14:44

3 Answers3

1

Strictly speaking the other two answers are right - you just have to know what the encoding is to be guaranteed of anything. However, there are libraries out there that will allow you to make educated guesses about the encoding. Check out ICU4J or jchardet, for example.

dcsohl
  • 7,186
  • 1
  • 26
  • 44
0

You have to know the encoding, you cannot read the files correctly if you don't know it. As UTF-8 works just keep using it. Also check with the producer of the files if they will keep producing them in UTF-8. They should document this.

peter.petrov
  • 38,363
  • 16
  • 94
  • 159
0

It is impossible to programmatically recognize encoding of a text file. The only way is to try to open it in a text editor with different encodings until you can read the text

Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275