1

I want to Identify given string is double byte character set or not, using java?

Thanks

ewan.chalmers
  • 16,145
  • 43
  • 60
Sameek Mishra
  • 9,174
  • 31
  • 92
  • 118

3 Answers3

2

A String does not have a character set property, in fact, it is always UTF-16 (16 bit used for each char).

Andreas Dolk
  • 113,398
  • 19
  • 180
  • 268
  • I think you mean that it's always UTF-16 (though that doesn't always mean that 16 bits is used for each character either) – Adam Batkin Jul 06 '11 at 12:06
  • @Adam Batkin - yes, my mistake, changed it already and linked it to the java language spec. – Andreas Dolk Jul 06 '11 at 12:07
  • I have a text field in which a user can enter data in chinese,japanese,English or Korean Languages. So we need to figure out a way to read these double byte characters. – Sameek Mishra Jul 06 '11 at 12:10
  • @sam the text field will store the character in a *"double byte"* `String` internally. If conversion is needed from a charset to UTF-16, then the component will take care of it. *You* get a `String` in UTF-16 from the textfield. – Andreas Dolk Jul 06 '11 at 12:31
  • @Andreas_D, you might find the -XX:+UseCompressedStrings option interesting. ;) http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html – Peter Lawrey Jul 06 '11 at 12:36
1

If you wanted to try to discover the likely charset of some input data (e.g. in a file or stream), then the ICU4J CharsetDetector could be used.

But by the time the data is in a String in your code, it is too late.

ewan.chalmers
  • 16,145
  • 43
  • 60
0

Your string has multibyte characters if String.codePointCount(int beginIndex, int endIndex) for the whole text range will give not zero result.

Gedrox
  • 3,592
  • 1
  • 21
  • 29