I'm told to write a code that get a string text and check if its encoding is equal the specific encoding that we want or not. I've searched a lot but I didn't seem to find anything. I found a method (getEncoding()) but it just works with files and that is not what I want. and also I'm told that i should use java library not methods of mozilla or apache. I really appreciate any help. thanks in advance.
2 Answers
What you are thinking of is "Internationalization". There are libraries for this like, Loc4j
, but you can also get this using java.util.Locale
in Java. However in general text is just text. It is a token with a certain value. No localization information is stored in the character. This is why a file normally provides the encoding in the header. A console or terminal can also provide localization using certain commands/functions.
Unless you know the source encoding and the token used you will have a limited ability to guess what encoding is used in the other end. If you still would want to do this you will need to go into deeper areas such as decryption where this kind of stuff usually is done using statistic analysis. This in turn requires databases on the usage of different tokens and depending on the quality of the text, databases and algorithms a specific amount of text is required. Special stuff, like writing Swedish with eg. US encoding (like using a
for å
and ä
or o
for ö
) will require more advanced analysis.
EDIT
Since I got a comment that encoding and internationalization is different entities I will add some comments. It is possible to work with different encodings working plainly with English (like some English special characters). It is also possible to work with encodings using for example Charset
. However for many applications using different encodings it may still be efficient to use Locale
, since this library can do a lot of operations on text with different encodings.

- 4,506
- 6
- 24
- 48
-
Thanks for ur anwer. Actually I'm new to java and I don't think that they ask me for such a complicated code (well, for me it is). There isn't any other way? – maryam Jan 23 '17 at 08:47
-
@maryam if you have some kind of string and you want to find the encoding for this, then you need to verify this compared for statistics for different encoding. However since you only want to find out if the text you have use a particular encoding the problem seems to be a different one. The string you use almost certainly has a source (eg a console or a file). Looking at the source you should be able to determine encoding. I case you know the encoding from all inputs there would be a minor issue finding out if a specific character have the same encoding. – patrik Jan 23 '17 at 09:16
-
I don't really know their business and what this will do so I have no idea where this string comes from and now I'm running it using @Test. I'm so confused... I don't know what to do. Sometimes I have doubts that maybe I haven't got the question correctly but I asked him yesterday and I was told that that was true... – maryam Jan 23 '17 at 10:38
-
I'm so grateful for your answer. Thanks. – maryam Jan 23 '17 at 10:38
-
@maryam Can I assume the issue you had is now solved then? Else, I assume the input comes from some stream, either from file or console. Without this you have no idea knowing which encoding being used. This is what requries statistic analysis. This kind of analysis is common for webpages supporting Internationalization. I am afraid this is the best answer I can give you. That is either find the source any see what encoding being used else use statistic analysis to find out :(. This is the reason why people tend to use 3rd party libs for this. Internationalization is a real pain to work with. – patrik Jan 23 '17 at 11:09
-
@patrik Internationalization is not the same as character encoding. It just happens that, when you want to deal with foreign characters, the ASCII character encoding falls short, and you need to resort to encodings that can represent those characters, like ISO-8859-1 or UTF-8. Just like you need to do if you want to support emoticons , which has nothing to do with i18n. You can use UTF-8 and never ever use Locale or support anything other than English. – walen Jan 25 '17 at 07:11
-
@walen You are right. I guess I read too much into this question. I assumed that since OP had to deal with some characters for different encodings, I assumed he was referring to different locales. Anyway I will edit this, though I still think `Locale` can be of great use working with different encodings. – patrik Jan 25 '17 at 11:01
Thanks for ur answers and contribution but these two link did the trick. I had already seen these two pages but it didn't seem to work for me cause I was thinking about get the encoding directly and then compare it with the specific one. This is one of them
-
Glad it worked out for you. However this assume you actually knows the encoding (or gets it from for example InputStream.getEncoding). Anyway, please provide an example on how you solved the problem, so that more people can get help from this. – patrik Jan 25 '17 at 11:15
-
Thanks. yes as you said i have a particular encoding that it is passed via annotation. and in my validator class i use the text and this encoding to check if this text the encoding or not. if my explanation is not enough don't hesitate to tell me so i will put the example. – maryam Jan 28 '17 at 08:47
-
Yes please write an example and a summary. This is not exactly a wish from me to see exactly what you have done. The problem is that the answer does not qualify as an answer in its current state. The content of a link must always be explained in case the link changes. In that case your answer would be useless. – patrik Jan 29 '17 at 09:48