110

I ask because I am sending a byte stream from a C process to Java. On the C side the 32 bit integer has the LSB is the first byte and MSB is the 4th byte.

So my question is: On the Java side when we read the byte as it was sent from the C process, what is endian on the Java side?

A follow-up question: If the endian on the Java side is not the same as the one sent, how can I convert between them?

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
hhafez
  • 38,949
  • 39
  • 113
  • 143
  • 6
    Here is my mnemonics for this so I won't forget: Java being no hardware but instead virtual, is the language of the internet. The **network byte order** is **big endian**. Therefore, Java is **big endian**. – daparic Jun 20 '20 at 21:30

9 Answers9

72

Use the network byte order (big endian), which is the same as Java uses anyway. See man htons for the different translators in C.

Egil
  • 5,600
  • 2
  • 32
  • 33
  • I'm not at my linux box now but is htons one of the standard libs? – hhafez Dec 12 '08 at 10:46
  • According to http://h30097.www3.hp.com/docs//base_doc/DOCUMENTATION/V51_HTML/MAN/MAN3/0383____.HTM its part of the standard c library, yes – Egil Dec 12 '08 at 10:49
  • I'll gives this a try next monday but it looks prommising – hhafez Dec 12 '08 at 10:56
  • 1
    htons is available almost everywhere, but it's not in ISO C. – MSalters Dec 12 '08 at 11:01
  • 1
    If you have to use something other than network byte order, then you either roll your own with bitwise operators or use the various versions of java.nio.Buffer – Darron Dec 12 '08 at 22:04
  • 1
    According to its man-page it's defined in POSIX.1, so it should be available pretty much everywhere. And I seem to remember using it in Win32, so it's not just on POSIX systems either. – Joachim Sauer Dec 12 '08 at 22:04
  • Ugh bloody big endian. Why does anyone still use that when all popular processor architectures are little endian? /grumble. – Timmmm Nov 12 '15 at 13:18
61

I stumbled here via Google and got my answer that Java is big endian.

Reading through the responses I'd like to point out that bytes do indeed have an endian order, although mercifully, if you've only dealt with “mainstream” microprocessors you are unlikely to have ever encountered it as Intel, Motorola, and Zilog all agreed on the shift direction of their UART chips and that MSB of a byte would be 2**7 and LSB would be 2**0 in their CPUs (I used the FORTRAN power notation to emphasize how old this stuff is :) ).

I ran into this issue with some Space Shuttle bit serial downlink data 20+ years ago when we replaced a $10K interface hardware with a Mac computer. There is a NASA Tech brief published about it long ago. I simply used a 256 element look up table with the bits reversed (table[0x01]=0x80 etc.) after each byte was shifted in from the bit stream.

jiwopene
  • 3,077
  • 17
  • 30
WB Greene
  • 619
  • 5
  • 2
  • Great insight! I has this question and no answers in web. – Xolve Aug 28 '13 at 13:46
  • 1
    if any of them public, could you link the NASA tech brief (and pethaps space shuttle bit serial downlink data) you are talking about? would be fascinating, I've never seen a thing like that. – n611x007 Nov 08 '13 at 16:43
  • 4
    Bitwise endianness also comes into play with compression formats that use some form of Huffman encoding (i.e. all of them). For extra fun, JPEG is "bitwise big-endian" (i.e. the most significant bit is the "first" bit) and LZ is "bitwise little-endian". I once worked on a proprietary compression format that used both formats under the hood. Oh, that was fun... – user435779 Aug 05 '14 at 14:31
  • Having started in bits, I thought THAT was endianess for a long time. – Roy Falk Mar 30 '16 at 07:24
22

There are no unsigned integers in Java. All integers are signed and in big endian.

On the C side the each byte has tne LSB at the start is on the left and the MSB at the end.

It sounds like you are using LSB as Least significant bit, are you? LSB usually stands for least significant byte. Endianness is not bit based but byte based.

To convert from unsigned byte to a Java integer:

int i = (int) b & 0xFF;

To convert from unsigned 32-bit little-endian in byte[] to Java long (from the top of my head, not tested):

long l = (long)b[0] & 0xFF;
l += ((long)b[1] & 0xFF) << 8;
l += ((long)b[2] & 0xFF) << 16;
l += ((long)b[3] & 0xFF) << 24;
jww
  • 97,681
  • 90
  • 411
  • 885
Jonas Elfström
  • 30,834
  • 6
  • 70
  • 106
  • just realised that :$ so how I am supposed to send this unsigned little endian to my java process to read it correctly? – hhafez Dec 12 '08 at 10:28
  • whay I mean by the start is that lsb is at the start of the 4 bytes (it's a unsigned 32 bit int ) so I did mean least significant byte – hhafez Dec 12 '08 at 10:40
  • Also I'm converting from C -> Java not from Java -> C :) – hhafez Dec 12 '08 at 10:57
  • Your code works fine, so long as you remove the semi-colon after 0xFF in the last three lines. I'd edit it myself, but that's a change of less than 6 characters. – Moose Morals Mar 03 '16 at 21:57
  • 1
    It took almost 8 years but finally someone spotted the syntax error. Thanks @MooseMorals :) – Jonas Elfström Mar 04 '16 at 10:04
12

There's no way this could influence anything in Java, since there's no (direct non-API) way to map some bytes directly into an int in Java.

Every API that does this or something similar defines the behaviour pretty precisely, so you should look up the documentation of that API.

Joachim Sauer
  • 302,674
  • 57
  • 556
  • 614
  • 4
    Oh sure there is. Binary math (&, |, <<, etc) works just fine on bytes and ints. It's quite easy to take arbitrary bytes and stick them into an integer. – Herms Dec 12 '08 at 21:57
  • 8
    But if you do this, you still can't tell what endianess your JVM uses internally. – Darron Dec 12 '08 at 22:02
  • 4
    Yes, but even there you're not directly mapping. You are using arithmetic that does exactly what you tell it, there's no ambiguity. In C you could always cast a "byte*" to a "long*" and de-reference it. Then you'd have to care about endianess. In Java there's no direct, ambiguous way to do that. – Joachim Sauer Dec 12 '08 at 22:02
  • Ah, I see. You were talking about the cast, not the binary math. Yea, in that case you're right. – Herms Dec 15 '08 at 14:57
  • 12
    **+1** for the "look up the documentation", but **NOTE:** the 1st sentence is not correct anymore since nowadays the NIO package offers ByteBuffer which can map bytes to primitives and where you can change the byte order. See [ByteBuffer](http://download.oracle.com/javase/6/docs/api/java/nio/ByteBuffer.html) and [ByteOrder](http://download.oracle.com/javase/6/docs/api/java/nio/ByteOrder.html) – user85421 Apr 12 '11 at 11:13
4

Imho there is no endianness defined for java. The endianness is the one of the hardware but java is highlevel and hides the hardware so you don't have to wory about that.

The only endianess related feature is how the java lib maps int and long to byte[] (and inversely). It does it Big-Endian which is the most readable and natural:

int i=0xAABBCCDD

maps to

byte[] b={0xAA,0xBB,0xCC,0xDD}
Javaddict
  • 483
  • 1
  • 4
  • 15
3

I would read the bytes one by one, and combine them into a long value. That way you control the endianness, and the communication process is transparent.

Wouter Lievens
  • 4,019
  • 5
  • 41
  • 66
  • Care to comment on why you're voting me down? – Wouter Lievens Dec 12 '08 at 11:09
  • because even if I where to read each byte individually the endianess of the byte that is sent would be incorrect so I would need to convert it – hhafez Dec 12 '08 at 21:48
  • 25
    Endianness of a byte? What the hell is that? Words have are sensitive to endianness, individual bytes don't. – Wouter Lievens Feb 19 '09 at 10:13
  • 4
    @hhafez That is not true, bytes does not have endianess as far as we need to be concerned if you read byte by byte, you, the programmer are responsible for assigning the bytes to the proper place. That is exactly what DataInputStream does, it just assembles the bytes together in a big endian way under the hoods. – nos Aug 20 '10 at 17:15
  • 3
    @WouterLievens: I've encountered some I/O devices (e.g. a real-time clock chip) which, for whatever reason, send out data in bit-reversed format; after receiving data from them, it's necessary to reverse the bits in each byte. I agree with you, though, that endian-ness of bytes is not *generally* an issue, unless one has to deal deal with particular oddly-designed pieces of hardware. – supercat Dec 17 '13 at 17:47
  • Might be a FIFO/LIFO kind of thing, I guess. Interesting anecdote :) – Wouter Lievens Dec 18 '13 at 18:51
  • @WouterLievens Sorry, can you please help me understand what you mean by: "Words have are sensitive to endianness, individual bytes don't." I am confused with "Words have are sesitive..." ??? – Koray Tugay Jan 20 '16 at 14:21
  • It's a grammatical error. What I meant to say there (seven years ago?!) is that words have endianness, bytes generally don't, because endianness is (informally) about how to combine bytes arithmetically into a word. – Wouter Lievens Jan 20 '16 at 14:23
  • @WouterLievens I am confused about one thing. Are we talking about the order of bytes in the CPU 's register, order of bytes in the memory, order of bytes in the disk, or all of them? – Koray Tugay Jan 20 '16 at 19:19
  • All of them, I guess – Wouter Lievens Jan 20 '16 at 19:23
  • @WouterLievens Jim here http://stackoverflow.com/questions/4504775/endianness-inside-cpu-registers says otherwise, so I am confused.. – Koray Tugay Jan 20 '16 at 21:17
3

Java is 'Big-endian' as noted above. That means that the MSB of an int is on the left if you examine memory (on an Intel CPU at least). The sign bit is also in the MSB for all Java integer types.
Reading a 4 byte unsigned integer from a binary file stored by a 'Little-endian' system takes a bit of adaptation in Java. DataInputStream's readInt() expects Big-endian format.
Here's an example that reads a four byte unsigned value (as displayed by HexEdit as 01 00 00 00) into an integer with a value of 1:

 // Declare an array of 4 shorts to hold the four unsigned bytes
 short[] tempShort = new short[4];
 for (int b = 0; b < 4; b++) {
    tempShort[b] = (short)dIStream.readUnsignedByte();           
 }
 int curVal = convToInt(tempShort);

 // Pass an array of four shorts which convert from LSB first 
 public int convToInt(short[] sb)
 {
   int answer = sb[0];
   answer += sb[1] << 8;
   answer += sb[2] << 16;
   answer += sb[3] << 24;
   return answer;        
 }
  • 1
    What does "noted above" refer to? The order in which SO answers are displayed can vary. – LarsH May 11 '20 at 13:49
2

If it fits the protocol you use, consider using a DataInputStream, where the behavior is very well defined.

Jens Bannmann
  • 4,845
  • 5
  • 49
  • 76
Ilja Preuß
  • 2,421
  • 17
  • 15
-1

java force indeed big endian : https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-2.html#jvms-2.11

  • 3
    This is about endianness of the bytecode instructions, not endianness of the data at runtime. – kaya3 Dec 04 '19 at 22:23
  • I'm voting up. This snippet `byte[] bbb = ByteBuffer.allocate(4).putFloat(0.42f).array();` produced a `byte` array that is the reverse of what my `C/C++` produced. Therefore, the **big endianness** of Java takes effect even in the data at runtime. – daparic Jun 20 '20 at 21:27