20

I was looking at the Javadoc for the CharSequence interface, implemented by String, StringBuferand a few others, more specifically at the chars() method, and the Javadoc says

Returns a stream of int zero-extending the char values from this sequence.

Now, I know that it returns int values, and that int that can be cast to char. But what does "zero-extending" mean?

Community
  • 1
  • 1
MikaelF
  • 3,518
  • 4
  • 20
  • 33
  • probably lsB and lsB+1 is char value, others(msB,msB-1) are zeroes so they are equal to char values in terms of integership or whatever underlying integers need to be to be equal to – huseyin tugrul buyukisik Feb 11 '17 at 01:46
  • 4
    My guess would be that the 16 bit char is made into a 32 bit int by setting the most significant 16 bits to 0. – David Choweller Feb 11 '17 at 01:51
  • 1
    char `bb` is mapped to int `00bb`. The funny thing is, they don't have CharStream etc. It is really tiresome to provide all the primitive version of streams and functions, and they stopped at `int, long, double`. – ZhongYu Feb 11 '17 at 01:52

1 Answers1

24

Int is a 32-bit value, char is a 16-bit value. Zero-extend just means that the higher-order "unused" bits in the int are zeroes.

I am guessing it is documented because, looked at in one way, the operation treats the char value as a 16-bit integer, and when converting from an integer to a larger integer, the user of a library method such as this must know how the sign is treated.

For those that don't know, a signed integer value reserves its highest-order bit as a 'sign bit'; if it is 1, the number is negative. When converting to a larger integer, if the highest-order bit is copied into all the 'extra' bits in the new value, we say the conversion is "sign extended". Only if this is done will the new integer have the same signed numeric value as the smaller signed integer. If the smaller integer value is unsigned, then the highest-order bit represents a value just like all the other bits, and only without sign extension will the values be the same.

The Java language does not have unsigned integer as a data type, so the conversion without sign extension could be regarded as unusual.

If one had a constant representing the (signed) integer value of a particular character, then after conversion to a 32-bit integer, the constant would only still be accurate if the conversion included sign extension. So it is important to know how any given conversion treats the original value.

arcy
  • 12,845
  • 12
  • 58
  • 103