Read bytes methods in C / C++

Question

I am new to C and i was wondering if there are standard library methods to read bytes/int/long such as: getChar(), getInt(), getLong().

So for instance if i call getInt(), it will return the 4 bytes as a string and move the char pointer address by 4. Where can i find these methods?

It is not really clear what you are trying to do. `getInt()` from where and as what? A `getInt()` that returns a char[4] doesn't make much sense to me. — pmr, Sep 16 '11 at 12:13

Kerrek SB · Answer 1 · 2011-09-16T12:39:51.560

No, binary (de)serialization is not directly supported systematically by the library. The read() function will move the stream pointer along, but I don't think you can get around a platform-dependent piece of code for interpreting the byte stream:

std::infile thefile("data.bin", "rb");

float f;
double d;
uint32_t i;

// the following is OK and doesn't constitute type punning
char * const pf = reinterpret_cast<char*>(&f);
char * const pd = reinterpret_cast<char*>(&d);
char * const pi = reinterpret_cast<char*>(&i);

// the following may or may not give you what you expect
// Caveat emptor, and add your own platform-specific code here.
thefile.read(pf, sizeof(float));
thefile.read(pd, sizeof(double));
thefile.read(pi, sizeof(uint32_t));

In the case of reading unsigned integral values only, you can perform an algebraic extraction which is in some sense type safe and only requires you to know the endianness of the serialized data format:

unsigned char buf[sizeof(uint32_t)];
thefile.read(reinterpret_cast<char*>(buf), sizeof(uint32_t));

uint32_t n = buf[0] + (buf[1] << 8) + (buf[2] << 16) + (buf[3] << 24); // little-endian

Reading floating point data in binary is particularly irksome because you have to know quite a lot of extra information about your data stream: Does it use IEEE754? (Does your platform?) What's the enidanness (float endianness is independent from integer endianness)? Or is it represented as something else entirely? Good documentation of the file format is crucial.

In C, you would use fread() and C-style casts, char * const pf = (char*)(&f).

Blagovest Buyukliev · Accepted Answer · 2011-09-16T12:30:31.590

1

Since pointer arithmetic is in the very nature of C, such Java-like functions are not available there.

To get an int out of some memory buffer you would do:

/* assuming that buf is of type void * */
int x = *((int *) buf);
/* advance to the position after the end of the int */
((int *) buf)++;

or more compactly:

int x = *((int *) buf)++;

edited Sep 16 '11 at 12:30

answered Sep 16 '11 at 12:14

Blagovest Buyukliev

42,498
14
94
130

Is that actually allowed, or is that cast UB? – Kerrek SB Sep 16 '11 at 12:22
3

`*((int *) buf)` is not UB - it just casts a void pointer to a typed pointer in order to dereference it. Of course, you need to make sure that `buf` is of a sufficient size. – Blagovest Buyukliev Sep 16 '11 at 12:25
Oh, sorry, you're talking about the C version with void pointers. Yeah, that's fine. – Kerrek SB Sep 16 '11 at 13:11
1

Casting the pointer to `int*` is not undefined behavior, but accessing memory through the resulting pointer is. This response is simply false, and doesn't work except in very special cases. – James Kanze Sep 16 '11 at 13:12
2

-1 Dereferencing the pointer after the cast is undefined behavior, and using `++` on the results of the cast shouldn't even compile. – James Kanze Sep 16 '11 at 13:14
I guess i can write my own function such as int getInt(char *buf) { return *((int *) buf)++; } – JasonKeef Sep 16 '11 at 13:17
@James - you are right that some compilers would not consider `(int *) buf` to be a valid lvalue, so applying the `++` operator would not compile. But would you explain how dereferencing that pointer is UB? I've done this thing many times. – Blagovest Buyukliev Sep 16 '11 at 13:30
1

@Blagovest It's not "some compilers would not consider `(int*)buf` to be a valid lvalue---it's that both the C and the C++ standards say that it is **not** an lvalue, and that applying `++` to it requires a diagnostic. Similarly, if an object has not been declared and initialized as an `int`, accessing it through an lvalue expression of type `int` is undefined behavior: in this particular case, for example, the pointer might not be sufficiently aligned, or the bytes might contain a trapping value when viewed as an `int`. – James Kanze Sep 16 '11 at 13:40
Why not simply alias the buffer with another name and an `int*` type? E.g. `int* iBuf = (int*) buf;` which you can then use like a normal array `iBuf[1]`. That is most definitely not UB. Also you can try @Jens Gustedt's answer in this question http://stackoverflow.com/questions/7446489/casting-a-pointer-does-not-produce-an-lvalue-why which will fix your lvalue woes. – nonsensickle Jul 03 '13 at 23:03
@Blagovest Buyukliev Besides, I don't see how this answer answers the given question. He never mentioned a buffer. It is a partial answer at best. Please update your answer with more information about at least constructing the buffer. – nonsensickle Jul 03 '13 at 23:07
1

It should read `int x = *(*(int **)&buf)++;` or, with [the help of LVALUE_CAST](http://stackoverflow.com/a/41401498), `int x = *LVALUE_CAST(int *,buf)++;`, see – Tino Dec 30 '16 at 19:22

Brian McFarland · Answer 3 · 2011-09-16T16:44:42.803

I believe you are referring to Java's ByteBuffer methods.

Note that if you operating on the same data processed by those functions that Java is always BIG endian regardless of the host's native byte order. Unless you know for sure that it's not, your C code is probably compiling to run on a LITTLE endian machine. Some rough guidelines if you're not sure: x86 (most PCs) are LE. ARM can be either, but usually LE. PowerPC and Itanium are BE.

Also, never dereference a char * or void * to any type large than 1-byte unless you know it's properly aligned. It will cause a bus fault or similar error if it's not.

So here would be my getInt() impl, assuming a BE/network byte-order (e.g. produced by Java) buffer. My apologies for being terse.

typedef struct ByteBuffer {
    const char * buffer;   /* Buffer base pointer */
    int          nextByte; /* Next byte to parse */
    int          size;     /* Size of buffer */
} ByteBuffer_t;

/* Get int from byte buffer, store results in 'i'. Return 0 on success, -1 on error */
int getInt(ByteBuffer * bb, int * i) {
   const char * b;
   if( (bb->nextByte + 3) < bb->size ) {
      b = &(bb->buffer[bb->nextByte]);
      /* Read as big-endian value */
      *i = (b[0] << 24) | (b[1] << 16) | (b[2] << 8) | b[0];
      bb->nextByte += 4;
      return 0;
   } else {
      return -1;
   }
}


void test(const char * buf, int bufSize) {
   ByteBuffer_t bb;
   int ival;

   bb.buffer = buf;
   bb.size   = bufSize;
   bb.nextByte = 0;

   while(1) {
      if( 0 == getInt(&bb, &ival) )
          printf("%d\n", ival);
      else
          break;     
   }
}

EDIT: Removed ntohl() call.... it didn't belong if your source data was really big endian. If it worked w/ that call in there, you probably need to swap the byte order on the shift-pack, which means it will be parsing little-endian byte streams instead.

score 0 · Answer 4 · answered Sep 16 '11 at 12:16

0

There is a getchar() function.

The standard input methods in c is by using

scanf("<format specifer string>",input param1, param2,...)

Take a look at http://www.cplusplus.com/reference/clibrary/cstdio/scanf/

answered Sep 16 '11 at 12:16

Jan S

1,831
15
21

That isn't what the OP wants. – Kerrek SB Sep 16 '11 at 13:27

Read bytes methods in C / C++

4 Answers4

Linked