76

I'm reading a binary file like this:

InputStream in = new FileInputStream( file );
byte[] buffer = new byte[1024];
while( ( in.read(buffer ) > -1 ) {

   int a = // ??? 
}

What I want to do it to read up to 4 bytes and create a int value from those but, I don't know how to do it.

I kind of feel like I have to grab 4 bytes at a time, and perform one "byte" operation ( like >> << >> & FF and stuff like that ) to create the new int

What's the idiom for this?

EDIT

Ooops this turn out to be a bit more complex ( to explain )

What I'm trying to do is, read a file ( may be ascii, binary, it doesn't matter ) and extract the integers it may have.

For instance suppose the binary content ( in base 2 ) :

00000000 00000000 00000000 00000001
00000000 00000000 00000000 00000010

The integer representation should be 1 , 2 right? :- / 1 for the first 32 bits, and 2 for the remaining 32 bits.

11111111 11111111 11111111 11111111

Would be -1

and

01111111 11111111 11111111 11111111

Would be Integer.MAX_VALUE ( 2147483647 )

cletus
  • 616,129
  • 168
  • 910
  • 942
OscarRyz
  • 196,001
  • 113
  • 385
  • 569

12 Answers12

75

ByteBuffer has this capability, and is able to work with both little and big endian integers.

Consider this example:


//  read the file into a byte array
File file = new File("file.bin");
FileInputStream fis = new FileInputStream(file);
byte [] arr = new byte[(int)file.length()];
fis.read(arr);

//  create a byte buffer and wrap the array
ByteBuffer bb = ByteBuffer.wrap(arr);

//  if the file uses little endian as apposed to network
//  (big endian, Java's native) format,
//  then set the byte order of the ByteBuffer
if(use_little_endian)
    bb.order(ByteOrder.LITTLE_ENDIAN);

//  read your integers using ByteBuffer's getInt().
//  four bytes converted into an integer!
System.out.println(bb.getInt());

Hope this helps.

Tom
  • 18,685
  • 15
  • 71
  • 81
39

If you have them already in a byte[] array, you can use:

int result = ByteBuffer.wrap(bytes).getInt();

source: here

Community
  • 1
  • 1
iTEgg
  • 8,212
  • 20
  • 73
  • 107
30

You should put it into a function like this:

public static int toInt(byte[] bytes, int offset) {
  int ret = 0;
  for (int i=0; i<4 && i+offset<bytes.length; i++) {
    ret <<= 8;
    ret |= (int)bytes[i] & 0xFF;
  }
  return ret;
}

Example:

byte[] bytes = new byte[]{-2, -4, -8, -16};
System.out.println(Integer.toBinaryString(toInt(bytes, 0)));

Output:

11111110111111001111100011110000

This takes care of running out of bytes and correctly handling negative byte values.

I'm unaware of a standard function for doing this.

Issues to consider:

  1. Endianness: different CPU architectures put the bytes that make up an int in different orders. Depending on how you come up with the byte array to begin with you may have to worry about this; and

  2. Buffering: if you grab 1024 bytes at a time and start a sequence at element 1022 you will hit the end of the buffer before you get 4 bytes. It's probably better to use some form of buffered input stream that does the buffered automatically so you can just use readByte() repeatedly and not worry about it otherwise;

  3. Trailing Buffer: the end of the input may be an uneven number of bytes (not a multiple of 4 specifically) depending on the source. But if you create the input to begin with and being a multiple of 4 is "guaranteed" (or at least a precondition) you may not need to concern yourself with it.

to further elaborate on the point of buffering, consider the BufferedInputStream:

InputStream in = new BufferedInputStream(new FileInputStream(file), 1024);

Now you have an InputStream that automatically buffers 1024 bytes at a time, which is a lot less awkward to deal with. This way you can happily read 4 bytes at a time and not worry about too much I/O.

Secondly you can also use DataInputStream:

InputStream in = new DataInputStream(new BufferedInputStream(
                     new FileInputStream(file), 1024));
byte b = in.readByte();

or even:

int i = in.readInt();

and not worry about constructing ints at all.

cletus
  • 616,129
  • 168
  • 910
  • 942
  • I just have to consider the fact my array might not read exact `% 4` bytes right? – OscarRyz Mar 04 '10 at 22:46
  • If the array's length is not %4, then you can pad the remaining bytes with 0. (Since x | 0 := x and 0 << n := 0). – Pindatjuh Mar 04 '10 at 22:49
  • Isn't DataInputStream or using RandomAccessFile easier? This way you can just do in.readInt(). – Taylor Leese Mar 04 '10 at 22:52
  • Because there are up to 256 integers in 1024 bytes, and reading one at a time would hit the dist 256x more times isn't? – OscarRyz Mar 04 '10 at 23:04
  • @Oscar: I think the point is that some of the Java IO classes will do this buffering automatically for you. – cletus Mar 04 '10 at 23:05
  • @Oscar - That depends on how you setup your stream. There's no reason you couldn't read the entire file into a BufferedInputStream and then wrap that with a DataInputStream and call readInt() in a loop. This would prevent what you are talking about. – Taylor Leese Mar 04 '10 at 23:08
  • 3
    One MAJOR problem with your code -- java's byte type is SIGNED, so if the top bit of any byte is set, your code will also set all the upper bits in the resulting int. You need to mask off the upper bits of each byte before shifting and oring, eg `(bytes[0] & 0xff) | ((bytes[1] & 0xff) << 8) | ...` – Chris Dodd Mar 04 '10 at 23:19
  • It's better to use standard library to handle byte -> int conversions than to hand code. Java even provides a library for doing this with different endianess, see java.nio.ByteBuffer. – Kevin Brock Mar 05 '10 at 12:18
  • @Chris Dodd, you helped me fix my network code, the & 0xFF fixed my problems! Thanks! – lfxgroove Jul 16 '12 at 08:14
  • 1
    I hate to say this, but your offset support is completely broken. See http://ideone.com/uCpovu, where I also have the fix. – quantum Dec 02 '12 at 21:10
  • I suggest to change the iteration construct to `for (int i=offset; i<4+offset && i – jackb Apr 26 '16 at 21:57
  • 1
    Thanks for the code snippet, i should point out a bug here - `ret |= (int)bytes[i] & 0xFF;` should really be `ret |= (int)bytes[i + offset] & 0xFF;` - otherwise the offset param is ignored completely. – Ying Feb 09 '17 at 00:23
  • it is great, but you should use bytes[i+offset] instead bytes[i] – Mikhail Ionkin Mar 18 '18 at 14:37
19

just see how DataInputStream.readInt() is implemented;

    int ch1 = in.read();
    int ch2 = in.read();
    int ch3 = in.read();
    int ch4 = in.read();
    if ((ch1 | ch2 | ch3 | ch4) < 0)
        throw new EOFException();
    return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + (ch4 << 0));
Santhosh Kumar Tekuri
  • 3,012
  • 22
  • 22
  • 9
    It should be noted that this is for big-endian ordered bytes, where as support for little only takes a small change: return ((ch4 << 24) + (ch3 << 16) + (ch2 << 8) + (ch1 << 0)); – Paul Gregoire Sep 16 '11 at 20:57
  • It is no correct. E.g., if 4th byte equals -1, and others are 0, your result is -1, but should be 255. int k = ((byte)-1) << 0; System.err.println(k); // -1 – Mikhail Ionkin Mar 18 '18 at 14:51
  • @MikhailIonkin Your comment is wrong, and this code is correct. in.read() does not return a byte. If it did, sign extension would occur when it was stored in an int variable. But in.read() returns the next byte of the stream converted to int WITHOUT sign extension. So If the next byte of the stream is 0xFF, in.read() would return 0x000000FF. The only way in.read() will return -1 is when you reach the end of the stream. – Craig Parton Aug 25 '18 at 13:21
  • @CraigParton yes, but question is how to convert **4 bytes**, not **4 ints** – Mikhail Ionkin Aug 26 '18 at 11:12
5

The easiest way is:

RandomAccessFile in = new RandomAccessFile("filename", "r"); 
int i = in.readInt();

-- or --

DataInputStream in = new DataInputStream(new BufferedInputStream(
    new FileInputStream("filename"))); 
int i = in.readInt();
Taylor Leese
  • 51,004
  • 28
  • 112
  • 141
  • 1
    assuming that his binary file contains big endian signed ints. otherwise it'll fail. horribly. :) – stmax Mar 04 '10 at 22:54
4

try something like this:

a = buffer[3];
a = a*256 + buffer[2];
a = a*256 + buffer[1];
a = a*256 + buffer[0];

this is assuming that the lowest byte comes first. if the highest byte comes first you might have to swap the indices (go from 0 to 3).

basically for each byte you want to add, you first multiply a by 256 (which equals a shift to the left by 8 bits) and then add the new byte.

stmax
  • 6,506
  • 4
  • 28
  • 45
  • Although I conceptually agree with Andrey, I'd hope any descent compiler would figure that out and fix it for you. However, << IS clearer for this purpose. – Bill K Mar 04 '10 at 22:57
  • @Andrey: to be fair, the Java compiler will probably translate `x * 256` into `x << 8` automatically. – cletus Mar 04 '10 at 22:57
  • depends on quality of compiler :) – Andrey Mar 05 '10 at 10:49
  • It's not because of the "faster" code that you should use `<<`, it's because of readability. By using `<<`, it is clear that we are doing bit operations rather than multiplication. In fact, I'd even change the `+`s to `|`s – Justin Jul 31 '14 at 23:26
3

Here is a simple solution that works for me:

int value = (a&255)+((b&255)<<8)+((c&255)<<16)+((d&255)<<24);

a is the least significant byte

b is the second least significant byte

c is the second most significant byte

and d is the most significant byte

1

You can also use BigInteger for variable length bytes. You can convert it to Long, Integer or Short, whichever suits your needs.

new BigInteger(bytes).intValue();

or to denote polarity:

new BigInteger(1, bytes).intValue();
Jamel Toms
  • 4,525
  • 2
  • 27
  • 26
1
for (int i = 0; i < buffer.length; i++)
{
   a = (a << 8) | buffer[i];
   if (i % 3 == 0)
   {
      //a is ready
      a = 0;
   }       
}
Andrey
  • 59,039
  • 12
  • 119
  • 163
1

For reading unsigned 4 bytes as integer we should use a long variable, because the sign bit is considered as part of the unsigned number.

long result = (((bytes[0] << 8 & bytes[1]) << 8 & bytes[2]) << 8) & bytes[3]; 
result = result & 0xFFFFFFFF;

This is tested well worked function

Mounir
  • 11
  • 1
0

The following code reads 4 bytes from array (a byte[]) at position index and returns a int. I tried out most of the code from the other answers on Java 10 and some other variants I dreamed up.

This code used the least amount of CPU time but allocates a ByteBuffer until Java 10's JIT gets rid of the allocation.

int result;

result = ByteBuffer.
   wrap(array).
   getInt(index);

This code is the best performing code that does not allocate anything. Unfortunately, it consumes 56% more CPU time compared to the above code.

int result;
short data0, data1, data2, data3;

data0  = (short) (array[index++] & 0x00FF);
data1  = (short) (array[index++] & 0x00FF);
data2  = (short) (array[index++] & 0x00FF);
data3  = (short) (array[index++] & 0x00FF);
result = (data0 << 24) | (data1 << 16) | (data2 << 8) | data3;
Nathan
  • 8,093
  • 8
  • 50
  • 76
  • If you do this: `(array[i] << 24) | ((array[i + 1] & 0xff) << 16) | ((array[i + 2] & 0xff) << 8) | (array[i + 3] & 0xff)` (ie, no conversion to `short` first) it performs equally well vs the `ByteBuffer` solution. I guess it might be optimized as a common pattern. – john16384 Apr 17 '19 at 13:48
0

Converting a 4-byte array into integer:

//Explictly declaring anInt=-4, byte-by-byte
byte[] anInt = {(byte)0xff,(byte)0xff,(byte)0xff,(byte)0xfc}; // Equals -4
//And now you have a 4-byte array with an integer equaling -4...
//Converting back to integer from 4-bytes...
result = (int) ( anInt[0]<<24 | ( (anInt[1]<<24)>>>8 ) | ( (anInt[2]<<24)>>>16) | ( (anInt[3]<<24)>>>24) );
mark_infinite
  • 383
  • 1
  • 7
  • 13