3

I have program which allocates a 32-bit int and subsequently tries to read 4 bytes from a socket into the int using read(2)

Sometimes the read is incomplete and returns having read say 2 bytes. Is there any way of recovering from this? I suppose I have to produce a pointer halfway into the int to be able to perform another read.

How are you supposed to handle this situation? I can imagine a couple of ugly ways, but no elegant one.

Alexander Torstling
  • 18,552
  • 7
  • 62
  • 74

3 Answers3

3

You will first need to ensure to have read the 4 bytes. You do this using function similar to this one (slightly modified):

#include <sys/types.h>
#include <sys/socket.h>

int readall(int s, char *buf, int *len)
{
    int total = 0;        // how many bytes we've read
    int bytesleft = *len; // how many we have left to read
    int n = -1;

    while(total < *len) {
        n = read(s, buf+total, bytesleft, 0);
        if (n <= 0)  { break; }  
        total += n;
        bytesleft -= n;
    }

    *len = total; // return number actually read here

    return (n<=0)?-1:0; // return -1 on failure, 0 on success
} 

And afterwards you can assemble an integer using these four bytes.

Giorgi Moniava
  • 27,046
  • 9
  • 53
  • 90
  • 1
    You need to break on `n == 0` as well, otherwise you will loop forever at end of stream. – user207421 Aug 26 '15 at 16:28
  • @EJP:You mean in case the other side disconnects? Ok I will amend the code(and assume failure if that is the case by returning -1, OP can change it for his needs). – Giorgi Moniava Aug 26 '15 at 16:36
  • @EJP: And there can be no valid cases when the return is 0 and say on next recv maybe it reads something? – Giorgi Moniava Aug 26 '15 at 16:39
  • I think it's reasonable for us to ignore timeouts in our answers; in real life, we'd want to use timeouts to see whether we get the requested data in a reasonable time. If we expect 4 bytes and we call read and get 0, I'd still wait for the timeout, personally. – Patrick87 Aug 26 '15 at 16:42
  • Thanks. You didn't have to post all that code though, I just wondered if it's considered ok to have an int as buffer and how to handle recovery in that case. I guess are saying no, but would you care to elaborate why? Is there no portable way of treating an int as a byte buffer? – Alexander Torstling Aug 26 '15 at 17:14
  • @AlexanderTorstling: well you could make a char pointer to int and advance it by as many bytes you receive and only consider all done when all 4 bytes are read? – Giorgi Moniava Aug 26 '15 at 17:18
  • @AlexanderTorstling The answer is that yes, you can have an int as a buffer, as I demonstrate in my answer. You can handle recovery by treating the memory it occupies a byte array and indexing into it. Is that OK? No, I would say, for a few reasons - it's hard to read, hard to understand, unnecessary, and breaks if you're communicating data between different kinds of systems (see my comment on your question). What do you mean by *portable*? Treating an array of four bytes as an integer is not portable across systems in general. As in, it will represent different numbers on different systems. – Patrick87 Aug 26 '15 at 17:37
  • @AlexanderTorstling:yeah or smth similar to Patrick's answer and make cast to char pointer to avoid problems due to pointer arithmetic (with int pointers +1 will add four bytes as you may know) – Giorgi Moniava Aug 26 '15 at 17:41
  • Alright guys. Saw this question http://stackoverflow.com/questions/4914183/whats-the-correct-way-to-add-1-byte-to-a-pointer-in-c-c which suggest a couple of ways of treating an int as a byte. I guess you have a point in it being hard to read though. The current code which I have (am maintaining) does a ntohl on the resulting int to cover for byte differences. What do you suggest I do then? Read byte array and bitshift into int, then ntohl? Or union of int, byte array and then ntohl? – Alexander Torstling Aug 26 '15 at 17:51
  • @AlexanderTorstling: There is no need for htonl if you use something like this: http://stackoverflow.com/questions/8173037/convert-4-bytes-char-to-int32-in-c, but you need to know the endianness of the data that comes in – Giorgi Moniava Aug 26 '15 at 18:05
  • @Giorgi: The data will be in network order. Had to read up on endianness and bitshift independence of endianness and now my understanding of what I need to do is the following: I read 4 bytes. I'll know that the byte order is big endian, so I'll get MSB first (index 0 in the array). I'll be able to construct the int by bit-shifting. Index 0 shifts 32 bits and so forth. This bit shift will work since bit shift isn't defined in terms of manipulating memory bit patterns but instead in terms of multiplication. Thus bit shift reconstruction will work the same irrespective of endianness. Correct? – Alexander Torstling Aug 26 '15 at 19:57
  • @AlexanderTorstling: For big endian data you will need something like this: `int i = (b[3] ) | ((unsigned)b[2] << 8) | ((unsigned)b[1] << 16) | ((unsigned)b[0] << 24);` – Giorgi Moniava Aug 27 '15 at 06:26
  • Minor: Better to `int n = -1` to cope with `*len <= 0`. Else `n` is undefined at function's end. – chux - Reinstate Monica Aug 31 '15 at 15:35
  • @chux: fair, modified. – Giorgi Moniava Aug 31 '15 at 16:51
  • To be more clear, if code is pathologically called with `len = 0 /* or something less */; readall(..., ... &len)`, then the `while(total < *len)` loop is never entered and code becomes `int n; ... return n==-1?-1:0;` leading to undefined behavior. Simple setting `int n = -1` copes with that. Corner cases like this are rarely encountered, but have a way of coming up late in a project or being exploited by hackers. – chux - Reinstate Monica Aug 31 '15 at 17:25
  • @chux:ok, I see but it is in programmers interest to call readall with meaningful length value. Unlikely he/she will call readall(...,0) - that would mean he/she expects nothing from server. – Giorgi Moniava Aug 31 '15 at 18:08
  • Agree unlikely, yet `len = 0; /* more code */ readall(..., ..., len)` is not unreasonable nor is the fix expensive. Consider `memcpy(x,y, 0), malloc(0), fread(ptr, 0, n,stream)`. None of these invoke undefined behavior because they are called with 0. – chux - Reinstate Monica Aug 31 '15 at 18:25
0

Disclaimer: this is bad code. Don't do it.

int value = 0;
int bytes = 0;
while ( (bytes += read(((char*)&value)+bytes, 4-bytes)) < 4 );
Patrick87
  • 27,682
  • 3
  • 38
  • 73
0

So with the information here and at What's the correct way to add 1 byte to a pointer in C/C++? I concluded that:

  1. Addressing into an int can cause alignment problems on some architectures.
  2. You can try to circumvent this by using type punning (a union between an int and a char[]), but in practice this technique seems no more reliable than 1.
  3. The only truly portable way of doing byte pointer arithmetic is using a char* into a char[]. Consequently the correct way of handling this situation is to read all the bytes into the array and construct the int afterwards.
  4. Reconstruction into an int can be done by ntohl or bit-shifting. Bitshifting is portable between host machine endianness since it is defined in terms of multiplication.

So, where does that leave us? Well, the portable option is to read into a char[] and reconstruct with bitshifting or ntohl. The non-portable way is to point a char * into your int as @Patrick87 suggested.

I went down the pragmatic road and just created a char* into my int. Worked fine on my target platform.

Community
  • 1
  • 1
Alexander Torstling
  • 18,552
  • 7
  • 62
  • 74