15

I want to read sizeof(int) bytes from a char* array.

a) In what scenario's do we need to worry if endianness needs to be checked?

b) How would you read the first 4 bytes either taking endianness into consideration or not.

EDIT : The sizeof(int) bytes that I have read needs to be compared with an integer value.

What is the best approach to go about this problem

kal
  • 28,545
  • 49
  • 129
  • 149
  • I'm a little confused about what you are trying to do. Could you write some pseudocode, as an example? Are you trying to parse integers from character array? – William Brendel Feb 13 '09 at 06:43
  • I am trying to find the sizeof(int) bytes from a char* array and trying to compare it with an integer. The source of the data is a different machine. – kal Feb 13 '09 at 06:57

9 Answers9

20

Do you mean something like that?:

char* a;
int i;
memcpy(&i, a, sizeof(i));

You only have to worry about endianess if the source of the data is from a different platform, like a device.

Dani van der Meer
  • 6,169
  • 3
  • 26
  • 45
10

a) You only need to worry about "endianness" (i.e., byte-swapping) if the data was created on a big-endian machine and is being processed on a little-endian machine, or vice versa. There are many ways this can occur, but here are a couple of examples.

  1. You receive data on a Windows machine via a socket. Windows employs a little-endian architecture while network data is "supposed" to be in big-endian format.
  2. You process a data file that was created on a system with a different "endianness."

In either of these cases, you'll need to byte-swap all numbers that are bigger than 1 byte, e.g., shorts, ints, longs, doubles, etc. However, if you are always dealing with data from the same platform, endian issues are of no concern.

b) Based on your question, it sounds like you have a char pointer and want to extract the first 4 bytes as an int and then deal with any endian issues. To do the extraction, use this:

int n = *(reinterpret_cast<int *>(myArray)); // where myArray is your data

Obviously, this assumes myArray is not a null pointer; otherwise, this will crash since it dereferences the pointer, so employ a good defensive programming scheme.

To swap the bytes on Windows, you can use the ntohs()/ntohl() and/or htons()/htonl() functions defined in winsock2.h. Or you can write some simple routines to do this in C++, for example:

inline unsigned short swap_16bit(unsigned short us)
{
    return (unsigned short)(((us & 0xFF00) >> 8) |
                            ((us & 0x00FF) << 8));
}

inline unsigned long swap_32bit(unsigned long ul)
{
    return (unsigned long)(((ul & 0xFF000000) >> 24) |
                           ((ul & 0x00FF0000) >>  8) |
                           ((ul & 0x0000FF00) <<  8) |
                           ((ul & 0x000000FF) << 24));
}
Matt Davis
  • 45,297
  • 16
  • 93
  • 124
  • 1
    u should mention that the first code snippet has the same problem like Daniels': it can access unaligned data that's not suitable for int* – Johannes Schaub - litb Feb 13 '09 at 07:46
  • That is the only thing i am missing in Java. It would be awesome at least to be able to read an int from a byte array. Maybe I implement some bytecode ops for this in my JVM-implementation – neoexpert Mar 09 '19 at 14:03
3

Depends on how you want to read them, I get the feeling you want to cast 4 bytes into an integer, doing so over network streamed data will usually end up in something like this:

int foo = *(int*)(stream+offset_in_stream);
Daniel Sloof
  • 12,568
  • 14
  • 72
  • 106
3

The easy way to solve this is to make sure whatever generates the bytes does so in a consistent endianness. Typically the "network byte order" used by various TCP/IP stuff is best: the library routines htonl and ntohl work very well with this, and they are usually fairly well optimized.

However, if network byte order is not being used, you may need to do things in other ways. You need to know two things: the size of an integer, and the byte order. Once you know that, you know how many bytes to extract and in which order to put them together into an int.

Some example code that assumes sizeof(int) is the right number of bytes:

#include <limits.h>

int bytes_to_int_big_endian(const char *bytes)
{
    int i;
    int result;

    result = 0;
    for (i = 0; i < sizeof(int); ++i)
        result = (result << CHAR_BIT) + bytes[i];
    return result;
}

int bytes_to_int_little_endian(const char *bytes)
{
    int i;
    int result;

    result = 0;
    for (i = 0; i < sizeof(int); ++i)
        result += bytes[i] << (i * CHAR_BIT);
    return result;
}


#ifdef TEST

#include <stdio.h>

int main(void)
{
    const int correct = 0x01020304;
    const char little[] = "\x04\x03\x02\x01";
    const char big[] = "\x01\x02\x03\x04";

    printf("correct: %0x\n", correct);
    printf("from big-endian: %0x\n", bytes_to_int_big_endian(big));
    printf("from-little-endian: %0x\n", bytes_to_int_little_endian(little));
    return 0;
}

#endif
3

How about

int int_from_bytes(const char * bytes, _Bool reverse)
{
    if(!reverse)
        return *(int *)(void *)bytes;

    char tmp[sizeof(int)];

    for(size_t i = sizeof(tmp); i--; ++bytes)
        tmp[i] = *bytes;

    return *(int *)(void *)tmp;
}

You'd use it like this:

int i = int_from_bytes(bytes, SYSTEM_ENDIANNESS != ARRAY_ENDIANNESS);

If you're on a system where casting void * to int * may result in alignment conflicts, you can use

int int_from_bytes(const char * bytes, _Bool reverse)
{
    int tmp;

    if(reverse)
    {
        for(size_t i = sizeof(tmp); i--; ++bytes)
            ((char *)&tmp)[i] = *bytes;
    }
    else memcpy(&tmp, bytes, sizeof(tmp));

    return tmp;
}
Christoph
  • 164,997
  • 36
  • 182
  • 240
1

You shouldn't need to worry about endianess unless you are reading the bytes from a source created on a different machine, e.g. a network stream.

Given that, can't you just use a for loop?

void ReadBytes(char * stream) {
    for (int i = 0; i < sizeof(int); i++) {
        char foo = stream[i];
        }
    }
 }

Are you asking for something more complicated than that?

Steve Rowe
  • 19,411
  • 9
  • 51
  • 82
1

You need to worry about endianess only if the data you're reading is composed of numbers which are larger than one byte.
if you're reading sizeof(int) bytes and expect to interpret them as an int then endianess makes a difference. essentially endianness is the way in which a machine interprets a series of more than 1 bytes into a numerical value.

shoosh
  • 76,898
  • 55
  • 205
  • 325
1

Just use a for loop that moves over the array in sizeof(int) chunks.
Use the function ntohl (found in the header <arpa/inet.h>, at least on Linux) to convert from bytes in the network order (network order is defined as big-endian) to local byte-order. That library function is implemented to perform the correct network-to-host conversion for whatever processor you're running on.

Chris Connett
  • 147
  • 1
  • 7
  • Of course, this applies only if you're actually reading something from the network... – gimpf Feb 13 '09 at 06:57
  • Ok, he stated in the _comment_ that he is reading it from a different machine. Well, maybe done by burning/reading a CD, but more probably he indeed meant some kind of network. – gimpf Feb 13 '09 at 06:59
1

Why read when you can just compare?

bool AreEqual(int i, char *data)
{
   return memcmp(&i, data, sizeof(int)) == 0;
}

If you are worrying about endianness when you need to convert all of integers to some invariant form. htonl and ntohl are good examples.

okutane
  • 13,754
  • 10
  • 59
  • 67