Java and unsigned values

Question

I'm parsing unsigned bits from a DatagramSocket. I have a total of 24bits (or 3 bytes) coming in - they are: 1 unsigned 8bit integer followed by a 16bit signed integer. But java never stores anything more than a signed byte into a byte/byte array? When java takes in these values, do you lose that last 8th bit?

DatagramSocket serverSocket = new DatagramSocket(666);
        byte[] receiveData = new byte[3]; <--Now at this moment I lost my 8th bit

        System.out.println("Binary Server Listing on Port: "+port);

        while (true)
        {
            DatagramPacket receivePacket = new DatagramPacket(receiveData, receiveData.length);
            serverSocket.receive(receivePacket);
            byte[] bArray = receivePacket.getData();
            byte b = bArray[0];

        }

enter image description here

Did I now lose this 8th bit since I turned it into a byte? Was it wrong I initialized a byte array of 3 bytes?

I would love to have unsigned variables (not only `char`) would make some code easier and would allow the compiler to make some optimization, only valid for unsigned values. — MrSmith42, Jan 02 '13 at 20:59
@MrSmith The compiler can do whatever it needs to do, as all operations on signed values are well-defined in Java. — starblue, Jan 09 '13 at 13:06
@starblue: I mean for unsigned values some optimizations are allowed, which are not allowed for signed values. E.g `a % 4` can be optimized ti `a & 2` that is only valid for positive numbers. => a javacompiler is not allowed to do this optimization, even if the developer only uses positive numbers. — MrSmith42, Jan 09 '13 at 13:09
@MrSmith You are right, unsigned division is one of the few things that is not easily emulated with operations on signed numbers. Java 8 will fix that by adding the missing operations for unsigned numbers: https://blogs.oracle.com/darcy/entry/unsigned_api — starblue, Jan 10 '13 at 09:41
http://stackoverflow.com/questions/397867/port-of-random-generator-from-c-to-java/397997#397997 — starblue, Jan 10 '13 at 09:44

score 12 · Accepted Answer · answered Jan 02 '13 at 20:58

12

When java takes in these values, do you lose that last 8th bit?

No. You just end up with a negative value when it's set.

So to get a value between 0 and 255, it's simplest to use something like this:

int b = bArray[0] & 0xff;

First the byte is promoted to an int, which will sign extend it, leading to 25 leading 1 bits if the high bit is 1 in the original value. The & 0xff then gets rid of the first 24 bits again :)

answered Jan 02 '13 at 20:58

Jon Skeet

1,421,763
867
9,128
9,194

3

If space is an issue, you don't have to convert to an `int`. You can still use `byte` and treat it as binary data. – Code-Apprentice Jan 02 '13 at 21:00
@Code-Guru could you demonstrate this. Thank you for your response – stackoverflow Jan 02 '13 at 21:07
1

@stackoverflow: It's hard to demonstrate it without knowing what you're trying to do with the value. But the data itself would indeed be safe. – Jon Skeet Jan 02 '13 at 21:07
1

@stackoverflow I'm not sure what you want me to demonstrate. The data in a `byte` is binary bits. It's more about how you *think* about the data than about how the computer actually stores it. – Code-Apprentice Jan 02 '13 at 21:09
@stackoverflow see my answer. I have done quite a bit of manipulation on unsigned values (by processing network packets where unsigned values are common) and I have figured out what is really happening. And it is not pretty... – fge Jan 02 '13 at 21:11
@JonSkeet The result in obtaining this value is technically considered widening, correct? So what would you do in the case you want to represent an unsigned Long? There is nothing more 'wider' in java to obtain these values. Are we forced into using BigInteger in these cases? – stackoverflow Jan 20 '13 at 16:49
@Mrshll187: Well the widening part is just the promotion of `byte` to `int`. The masking is separate. There's no simple way of representing an unsigned `long` in Java. Guava (http://guava-libraries.googlecode.com) has an `UnsignedLong` class which can help. – Jon Skeet Jan 20 '13 at 16:59

fge · Answer 2 · 2013-01-05T01:35:19.267

No, you do not lose the 8th bit. But unfortunately, Java has two "features" which make it harder than reasonable to deal with such values:

all of its primitive types are signed;
when "unwrapping" a primitive type to another primitive type with a greater size (for instance, reading a byte to an int as is the case here), the sign bit of the "lower type" is expanded.

Which means that, for instance, if you read byte 0x80, which translates in binary as:

1000 0000

when you read it as an integer, you get:

1111 1111 1111 1111 1111 1111 1000 0000
                              ^
                              This freaking bit gets expanded!

whereas you really wanted:

0000 0000 0000 0000 0000 0000 1000 0000

ie, integer value 128. You therefore MUST mask it:

int b = array[0] & 0xff;

1111 1111 1111 1111 1111 1111 1000 0000 <-- byte read as an int, your original value of b
0000 0000 0000 0000 0000 0000 1111 1111 <-- mask (0xff)
--------------------------------------- <-- anded, give
0000 0000 0000 0000 0000 0000 1000 0000 <-- expected result

Sad, but true.

More generally: if you wish to manipulate a lot of byte-oriented data, I suggest you have a look at ByteBuffer, it can help a lot. But unfortunately, this won't save you from bitmask manipulations, it is just that it makes it easier to read a given quantity of bytes as a time (as primitive types).

Very helpful and informative with your demonstration. I greatly appreciate your response — stackoverflow, Jan 02 '13 at 21:13
@stackoverflow except that I coded an int on 16 bits whereas it is 32 :) Fixed, but the principle stays the same. — fge, Jan 02 '13 at 21:15
Yeah I noticed but I got what you were trying to say. thanks again — stackoverflow, Jan 02 '13 at 21:20

Code-Apprentice · Answer 3 · 2013-01-02T21:42:47.083

2

In Java, byte (as well as short, int and long) is only a signed numeric data types. However, this does not imply any loss of data when treating them as unsigned binary data. As your illustration shows, 10000000 is -128 as a signed decimal number. If you are dealing with binary data, just treat it as its binary form and you will be fine.

edited Jan 02 '13 at 21:42

answered Jan 02 '13 at 20:59

Code-Apprentice

81,660
23
145
268

3

By the way: `char`is an unsigned datatype – MrSmith42 Jan 02 '13 at 21:00
@MrSmith42 Thanks for the catch. I qualified my statement to hopefully be more accurate. – Code-Apprentice Jan 02 '13 at 21:02
1

`char` is a *numeric unsigned* data type with values 0 .. 65535 . But I think it should not be used as a general purpose numeric type. – MrSmith42 Jan 02 '13 at 21:09
1

@MrSmith42 I was trying to avoid wordiness and possibly leaving out applicable types. Hopefully my newest edit is more accurate. What do you think? – Code-Apprentice Jan 02 '13 at 21:43

Java and unsigned values

3 Answers3

Linked

Related