How to define charset based on the operating system?

Question

For example:

In russian language, has the charset of Koi8-r and cp866. In case of linux and unix the charset koi8-r works fine. But in case of windows and dos cp866 works fine. Is there any way to define the charset correctly based on platform. I want this to be done for all languages. Please help thanks in advance

Could you please explain your problem more thoroughly. What is the client/server setup. What protocols are you using. You cannot solve the problem like you explain below, but maybe we can give you some hints to a better solution if we have more information available. — Martin Olsen, Nov 15 '11 at 07:18

score 1 · Answer 1 · answered Nov 15 '11 at 06:51

1

It would be very bad to determine the charset based solely on the type of operating system. It is however very easy to determine the character set at runtime. On Unix f.ex. we have the $LC_* class of environment variables. In Java it is even easier.

A quick search reveals this page as an example: http://www.rgagnon.com/javadetails/java-0505.html

What you want is probably the java.nio.charset.Charset.defaultCharset().name() value.

answered Nov 15 '11 at 06:51

Martin Olsen

1,895
1
17
20

My need is based on client and server operation. From the client they request some files. from the request i have to get the platform, based on that i have to encode the filename in server and return the reponse. so server will always in the same platform. Based on client platform, i have to rturn response. – Roshan Nov 15 '11 at 06:56
1

Protocols like HTTP usually include an encoding header. – Martin Olsen Nov 15 '11 at 06:59
1

If you use a proprietary protocol, you should handle the encoding at that level. Either include a header (like HTTP) or decide on a standard encoding for transfers! – Martin Olsen Nov 15 '11 at 07:01
The user can select any language on the client side. Based on that i have to change the encoding. Is there any avaliable list that define proper charset based on the platform for all charsets. like this one. http://download.oracle.com/javase/1.4.2/docs/guide/intl/encoding.doc.html – Roshan Nov 15 '11 at 07:06
1

Can you just use UTF-8 for everything when everything fails? – ee. Nov 15 '11 at 07:06
@user828234: No, you **cannot** determine charset based solely on the type of platform. Any platform can have **any** encoding! – Martin Olsen Nov 15 '11 at 07:07
Is there manually available list for my reference? based on i put condition in the code. – Roshan Nov 15 '11 at 07:08
If the client is in your control (i.e. you have the source, can compile, etc.) why not encode the data to a predetermined charset before sending? – Martin Olsen Nov 15 '11 at 07:10
@user828234: Sorry, but no. I do not think a list like that exists. – Martin Olsen Nov 15 '11 at 07:14
@user828234: There are ways to automatically determine data encoding, but the are (in my experience) messy: http://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream – Martin Olsen Nov 15 '11 at 07:20

score 1 · Accepted Answer · edited Oct 07 '21 at 06:16

My need is based on client and server operation. From the client they request some files. from the request i have to get the platform, based on that i have to encode the filename in server and return the reponse. so server will always in the same platform. Based on client platform, i have to rturn response.

You seem to be under the impression that client/server protocols are supposed to decide their character encoding based on the client's OS and locale. That is not required. For example, the HTTP Accept-Charset header is allowed to be ignored. What is required (at least for IETF protocols) is the ability to use UTF-8, and to declare the encoding (e.g., Content-Type: text/html; charset=KOI8-R).

Unless you have a compelling reason to do otherwise, I'd recommend sending your response in UTF-8. That's what ⅔ of the Web does.

The remaining question is how to determine the file encoding on the server. An approach that works most of the time is:

If it validates as UTF-8, then assume it is UTF-8.
Otherwise, assume the platform's default encoding (e.g., java.nio.charset.Charset.defaultCharset().name() as recommended by Martin).

(If desired, you can also add detect for UTF-32 (with or without BOM) and/or UTF-16 (with BOM).)

score -1 · Answer 3 · edited May 23 '17 at 12:03

-1

Use sun.jnu.encoding system property

See What exactly is sun.jnu.encoding?

edited May 23 '17 at 12:03

Community

1
1

answered Nov 15 '11 at 06:49

ee.

947
5
5

@Martin Olsen Thanks, I didn't explain further... Encoding is always platform-dependent :) – ee. Nov 15 '11 at 06:54
More: [http://happygiraffe.net/blog/2009/09/24/java-platform-encoding/](http://happygiraffe.net/blog/2009/09/24/java-platform-encoding/) – ee. Nov 15 '11 at 07:01

How to define charset based on the operating system?

3 Answers3