3

Hello i have a small dout in little endian and big endian i know this question has asked n no of times but i could not figure out some below points

lets take int i=10 it is store in binary as 00000000 00000000 00000000 00001010 in stack section as below:-

00000000 |00000000 |00000000 |00001010   // In case of little endian
MSB-------------------------------------------LSB

Big endian

00001010 |00000000 |00000000 |00000000   // In case of in big endian
MSB-------------------------------------------LSB

In this both little and big endian will give same output 10 ?

Then what is the use of these both little and big endian?


I was asked to implement code which will be portable for all system that is big or small in my interview. I replied saying:

compiler will do it self like if int i=10 in little endian then in big endian too it is 10 as output

Is that answer correct?

πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190
leuage
  • 566
  • 3
  • 17
  • 1
    I'm pretty sure you just switch the bytes, not the bit order. – StuartLC Dec 09 '14 at 15:53
  • Only the bytes are switched, not the bit! – nouney Dec 09 '14 at 15:53
  • 1
    _"i was asked to implement code which will be portable for all system ..."_ Which kind of code in particular? Sending some numbers over network, or serialize to files? Any other cases usually don't need to care about endianess. – πάντα ῥεῖ Dec 09 '14 at 15:56
  • @StuartLC can you please show a rough diagram? – leuage Dec 09 '14 at 15:59
  • You seem to be confusing MSB (most significant byte) with highest memory address. The number will be stored in the LSB (least significant byte) in both of your examples. However in the little endian the LSB will be lower in memory. – Galik Dec 09 '14 at 16:02
  • @yes rcgldr has it. Endianness is relevant when transferring data in low level formats between disparate systems, e.g. when writing verbatim memory bytes to file or to network packets. – StuartLC Dec 09 '14 at 16:04
  • @StuartLC ok if both give same out put then why this concept is so important please can you elaborate it? – leuage Dec 09 '14 at 16:09
  • @πάνταῥεῖ like if int i=10 in little then it should give same out put in big – leuage Dec 09 '14 at 16:16
  • @yes The binary _output_ for network transmissions or serialization to files will be different on big-/little-endian machines. If you mean the formatted text _output_ (e.g. as with `cout << i;`), it will be the same regardless the machines endianess. – πάντα ῥεῖ Dec 09 '14 at 16:19
  • @πάνταῥεῖ How please can u expalin with small example for both the case please? – leuage Dec 09 '14 at 16:23
  • @yes Done. I hope it's clear enough for you now. – πάντα ῥεῖ Dec 09 '14 at 16:38

4 Answers4

4
00000000 | 00000000 | 00000000 | 00001010 // big    endian

00001010 | 00000000 | 00000000 | 00000000 // little endian

Whether data is stored in big endian or little endian mode mostly only matters if you're trying to access a smaller portion of a variable in memory, usually via a pointer, like trying to access the least significant character of a 32 bit integer via a pointer to character or a union with a character array. Another example of an issue is if you read data from a file directly into an array of 32 bit integers or if you write data from an array of 32 bit integers. The data in the file will usually be also stored in little endian or big endian mode.

As far as I'm aware, there's no generic compile time method to determine if the cpu is running in big endian mode or little endian mode (specific compilers may have defines for this). You could write test code using a union of 32 bit integer and a character array of size 4. Then set the integer in the union to 10, and check to see if the union character array[0] contains the 10 which means little endian mode, or if the union character array[3] contains the 10, which means big endian mode. Other methods to determine if the CPU is in little endian or big endian mode are possible.

Once you determine if the cpu is in little endian or big endian mode, you can include conditional code to handle both cases, such as the file I/O to / from an array of 32 bit integers. If you wanted the file data to be in big endian mode, but your cpu is in little endian mode, you'd have to reverse the bytes of each integer before writing or after reading from a file.

You could also write code sequences to store data in big endian mode, regardless of the cpu mode. It would waste time if already in big endian mode, but it works for both big and little endian mode:

char     buffer[256];
char *   ptr2char;
uint32_t uint32bit;
/* ... */
    ptr2char = buffer;    /* store uint32bit in big endian mode */
    *ptr2char++ = (uint32bit >> 24)&0xff;
    *ptr2char++ = (uint32bit >> 16)&0xff;
    *ptr2char++ = (uint32bit >>  8)&0xff;
    *ptr2char++ = (uint32bit      )&0xff;
rcgldr
  • 27,407
  • 3
  • 36
  • 61
  • @rcfldr 00000000 | 00000000 | 00000000 | 00001010 // big endian i guess if compiler start reading from right-left is little endian isn't? 00001010 | 00000000 | 00000000 | 00000000 // little endian – leuage Dec 09 '14 at 16:05
  • @TomásBadan you are wrong. If you access `int` through union with `char[]` or simply `char *` which points to address of `int` what you get will be different on big and little endian platforms. – Slava Dec 09 '14 at 16:26
  • @TomásBadan and? you run it on little endian platform and got 10 (I assume ideone uses Intel or AMD), try to run it on big endian one. – Slava Dec 09 '14 at 16:49
  • @rcgldr ok i got your code my last question as both produce same output that is 10 as bove exmple then why little and big endian came to picture why we need it ? – leuage Dec 09 '14 at 16:54
  • @yes enidaness is defined by CPU for platform where you run your code, why they use different one you should rely on that CPU creator documentation. – Slava Dec 09 '14 at 16:57
  • @Slava what all i want to know is why we need a system with different endian if its output is same what is its advantage? – leuage Dec 09 '14 at 17:04
  • @yes again calculations on computers are done by CPU, some CPU like Intel and compatible use little endian representation, others like Sun and HP use big endian. Why CPU makers decided to use one or another you should look for their explanation. Probably makes internal schematic simpler based on design. – Slava Dec 09 '14 at 17:09
  • _@rcfldr_ While you're correctly answering the OP's confusion about the actual byte order of big-/little-endian values, your answer doesn't really point out the consequences/possible solutions for portability between BE/LE machine architectures. – πάντα ῥεῖ Dec 09 '14 at 18:41
  • @πάνταῥεῖ - I updated my post to include a couple of examples, but trying to cover all the possible issues and solutions in detail probably isn't needed. Once the poster figures out how to handle some situations, he can use that knowledge to deal with other situations. – rcgldr Dec 09 '14 at 20:42
  • @rcgldr _"As far as I'm aware, there's no generic compile time method to determine if the cpu is running in big endian mode or little endian mode"_ [What about these?](http://stackoverflow.com/questions/1001307/detecting-endianness-programmatically-in-a-c-program) It's arguable, if you want/need to detect endianess at runtime though. Your answer isn't really concise/usable for real life (you would have failed _the interview question_, if I would have been asking so). – πάντα ῥεῖ Dec 09 '14 at 20:54
  • @πάνταῥεῖ - The issue is determining endianess at compile time via macros, defines, ..., as opposed to runtime, of which there are many methods. I'm not sure there is a concise answer for "implement code which will be portable for all systems", especially if the systems will be exchanging data via communication and/or files, and endianess is low on the scale of issues when making a program portable to "all systems". Would "all systems" include embedded code for a smart device, 8 bit project boards that run CP/M, PC's, mainframes? – rcgldr Dec 10 '14 at 02:46
1

Just to correct your diagram for the integer: int i = 10;

// Big endian
&i <- address of i
00000000 |00000000 |00000000 |00001010 // In case of big endian

MSB---------------------------LSB


// Lower memory -----------------> higher memory


// Little endian

00001010 |00000000 |00000000 |00000000 // In case of in little endian
&i <- address of i
LSB---------------------------MSB

In little endian the Least Significant Byte (LSB) is stored in the lowest memory address.

In big endian the Most Significant Byte (MSB) is stored in the lowest memory address.

Galik
  • 47,303
  • 4
  • 80
  • 117
  • in 3rd line it should be LSB----MSB? – leuage Dec 09 '14 at 16:21
  • @yes no. LSB means *Least Significant Byte*. It holds the *smallest value* part of the number. – Galik Dec 09 '14 at 16:27
  • How can we say that left one is only LSB Not MSB because i read some where compiler start reading instruction from right to left ? – leuage Dec 09 '14 at 16:38
  • 1
    @yes The left one is not *only* **LSB**. It is **LSB** in *little endian* but **MSB** in *big endian*. This is not related to how the *compiler* reads instructions. It is about how the CPU reads data. It will though impact on some of the compiler (or assembler) function. – Galik Dec 09 '14 at 16:56
  • @Galik _"Just to correct your diagram for the integer: ..."_ While you're technically correct, regarding the wrong sample given in the OP, that doesn't answer when you need to care about portability, or not. – πάντα ῥεῖ Dec 09 '14 at 18:37
1

1st of all: You actually confused big- and little-endian byte order, as pointed out in @rcgldr's and @Galik's answers. The byte order is exactly vice versa, as you're showing in your sample:

00000000 | 00000000 | 00000000 | 00001010 // big endian

00001010 | 00000000 | 00000000 | 00000000 // little endian

As for your assumptions and questions:

"In This both little and big endian will give same output 10 ?"

It depends on the kind of output you're referring to.

  1. The following code will be portable regardless of the host machines' endianess, the output is formatted text ("10") in any case:

int i = 10;

std::cout << i << std::endl;

  1. The follwing code will not be portable. Since the values are written in binary form, the byte order will be kept verbatim:

int i = 10;

std::ofstream binfile("binaryfile.bin");
binfile.write((const char*)&i,sizeof(int));

The latter sample will not work, if the file should be read on a host machine with a different endianess.

To solve these kind of problems there's the htonl(), ntohl() function family. Usually one agrees to use network byte order (big-endian) format, to store binary data or send it over the network.

Here's a short sample, how to use the mentioned byte order conversion functions:


int i = 10;
int sendValue = htonl(i); // convert the value of i to network byte order

std::ofstream binfile("binaryfile.bin");
binfile.write((const char*)&sendValue,sizeof(int)); // write the adapted value

std::ifstream binfile("binaryfile.bin");
int recvValue = 0;
binfile.read((char*)&recvValue,sizeof(int)); // read the value in network byte order
int i = ntohl(recvValue); // convert the value of recvValue to host byte order

"Then what is the use of these both little and big endian?"

The reason (use) for the different formats is, that there are different CPU architectures, that use different ways to represent integer values in memory, depending on what's the most efficient way accessing them for their particular hardware design.
There's no worse/better for these architectural differences, that's why it's called endianess. The very origin for this coinage comes from Johnatan Swift's novel "Gulliver's travels" and was first (?) mentioned in Daniel Cohen's article "ON HOLY WARS AND A PLEA FOR PEACE".


"compiler will do it self like if int i=10 in little endian then in big endian too it is 10 as output"

Well, as you see from the exmples above, this answer was wrong.

Community
  • 1
  • 1
πάντα ῥεῖ
  • 1
  • 13
  • 116
  • 190
1

Endianness matters in the following situations:

  1. You're directly examining/manipulating bytes in a multi-byte type
  2. You're serializing binary data, or transferring binary data between different architectures

Directly examining/manipulating bytes in a multi-byte type

For example, suppose you want to split out and display the binary representation of a 32-bit IEEE float. The following shows the layout of a float and the addresses of the corresponding bytes in both big- and little-endian architectures:

A        A+1      A+2      A+3        Big endian
-------- -------- -------- --------   s = sign bit
seeeeeee efffffff ffffffff ffffffff   e = exponent bit
-------- -------- -------- --------   f = fraction bit
A+3      A+2      A+1      A          Little Endian
-------- -------- -------- --------
A+1      A        A+3      A+2        "Middle" Endian (VAX)

The sign bit is in the most significant byte (MSB) of a float. On a big-endian system, the MSB is in byte A; on a little-endian system, it's in byte A+3. On some oddballs like the old VAX F float, it's stuck in the middle at byte A+1.

So if you want to mask out the sign bit, you could do something like the following:

float val = some_value();
unsigned char *p = (unsigned char *) &val; // treat val as an array of unsigned char

// Assume big-endian to begin with
int idx = 0;

if ( little_endian() )
  idx = 3;

int sign = (p[idx] & 0x80) >> 7

Serializing or transferring binary data

For another example, you want to save binary (not text) data such that it can be read by either big- or little-endian systems, or you're transferring binary data from one system to a another. The convention for Internet transfers is big-endian (MSB first), so prior to sending a message over the 'net, you'd use calls like htonl (host-to-network long) and htons (host-to-network short) to perform any necessary byte swaps prior to sending the data:

uint32_t host_value = some_value();
uint32_t network_value = htonl( host_value ); 
send( sock, &network_value, sizeof network_value, 0 ); 

On a little-endian system like x86, htonl will reorder the bytes of host_value from 0,1,2,3 to 3,2,1,0 and save the result to network_value. On a big-endian system, htonl is basically a no-op. The inverse operations are ntohl and ntohs.

If you're not doing anything like the above, then you generally don't have to worry about endianness at all.

John Bode
  • 119,563
  • 19
  • 122
  • 198