Issue sending/Receiving vector of double over TCP socket (missing data)

Question

I am trying to send data from a vector over a TCP socket. I'm working with a vector that I fill with values from 0 to 4999, and then send it to the socket.

Client side, I'm receiving the data into a vector, then I copy its data to another vector until I received all the data from the server.

The issue I'm facing is that when I receive my data, sometimes I will get all of it, and sometimes I will only receive the correct data from 0 to 1625 and then I get trash data until the end (please see the image below). I even received for example from 0 to 2600 correct data, then from 2601 to 3500 it's trash and finally from 3501 to 4999 it's correct again.

File containing received data
(left column is line number and right column is the data).

This is the server side :

vector<double> values2;
for(int i=0; i<5000; i++)
    values2.push_back(i);
skt.sendmsg(&values2[0], values2.size()*sizeof(double));

The function sendmsg :

void Socket::sendmsg(const void *buf, size_t len){

    int bytes=-1;

    bytes = send(m_csock, buf, len, MSG_CONFIRM);

    cout << "Bytes sent: " << bytes << endl;

}

Client side :

vector<double> final;
vector<double> msgrcvd(4096);

do{

    bytes += recv(sock, &msgrcvd[0], msgrcvd.size()*sizeof(double), 0);
    cout << "Bytes received: " << bytes << endl;

    //Get rid of the trailing zeros
    while(!msgrcvd.empty() && msgrcvd[msgrcvd.size() - 1] == 0){
        msgrcvd.pop_back();

    }

    //Insert buffer content into final vector
    final.insert(final.end(), msgrcvd.begin(), msgrcvd.end());


}while(bytes < sizeof(double)*5000);


//Write the received data in a txt file

for(int i=0; i<final.size(); i++)
    myfile << final[i] << endl;

myfile.close();

The outputs of the bytes are correct, the server outputs 40 000 when sending the data and the client also outputs 40 000 when receiving the data.

Removing the trailing zeros and then inserting the content of the buffer into a new vector is not very efficient, but I don't think it's the issue. If you have any clues on how to make it more efficient, it would be great!

I don't really know if the issue is when I send the data or when I receive it, and also I don't really get why sometimes (rarely), I get all the data.

Is `bytes` always a multiple of 8? – 1201ProgramAlarm May 27 '19 at 19:07 — 1201ProgramAlarm, May 27 '19 at 19:07
@1201ProgramAlarm I guess it is, since 1 byte is 8 bits ? – Juju May 28 '19 at 08:42 — Juju, May 28 '19 at 08:42

1201ProgramAlarm · Accepted Answer · 2019-05-28T15:37:42.383

recv receives bytes, and doesn't necessarily wait for all the data that was sent. So you can be receiving part of a double.

Your code works if you receive complete double values, but will fail when you receive part of a value. You should receive your data in a char buffer, then unpack it into doubles. (Possibly converting endianness if the server and client are different.)

#include <cstring>    // For memcpy

std::array<char, 1024> msgbuf;
double d;
char data[sizeof(double)];
int carryover = 0;

do {
    int b = recv(sock, &msgbuf[carryover], msgbuf.size() * sizeof(msgbuf[0]) - carryover, 0);
    bytes += b;
    b += carryover;
    const char *mp = &msgbuf[0];
    while (b >= sizeof(double)) {
        char *bp = data;
        for (int i = 0; i < sizeof(double); ++i) {
            *bp++ = *mp++;
        }
        std::memcpy(&d, data, sizeof(double));
        final.push_back(d);
        b -= sizeof(double);
    }
    carryover = b % sizeof(double);
    // Take care of the extra bytes.  Copy them down to the start of the buffer
    for (int j = 0; j < carryover; ++j) {
        msgbuf[j] = *mp++;
    }
} while (bytes < sizeof(double) * 5000);

This uses type punning from What's a proper way of type-punning a float to an int and vice-versa? to convert the received binary data to a double, and assumes the endianness of the client and server are the same.

Incidentally, how does the receiver know how many values it is receiving? You have a mix of hard coded values (5000) and dynamic values (.size()) in your server code.

_{Note: code not compiled or tested}

Thank you for your answer, there are several points that i don't understand in your code. First, your when you make the recv, you're storing the return value in "bytes b" . I guess it is b = recv(...). Then you do "bytes += b", but bytes is a char array so what does this line do? Also to let the receiver know how many values it is receiving, I thought of putting the size in the first element of the vector and sending it from the server. I could also send the size first, then the data after. — Juju, May 28 '19 at 08:38
@Juju I fixed a few typos in my code. Either sending the size first, or using a fixed number of elements, is fine, but you should be consistent and not mix styles. — 1201ProgramAlarm, May 28 '19 at 15:22
Yeah, i understand, i'm working with bigger datas array, so i implemented this vector just for testing. This is why I used hard coded values. What is your `char *base = &msgbuf[0];` used for? I'll try your solution tomorrow and i'll let you know! — Juju, May 28 '19 at 15:34
@Juju the declaration of `base` was left over from an early version of the code. I removed its use as I worked on the code and neglected to remove the declaration. — 1201ProgramAlarm, May 28 '19 at 15:38

Yury Schkatula · Answer 2 · 2019-05-29T15:57:56.340

TL/DR: Never-ever send raw data via a network socket and expect them properly received/unpacked on other side.

Detailed answer: Network is built on top of various protocols, and this is for a reason. Once you send something, there is no warranty you counterparty is on the same OS and same software version. There is no standard how primitive types should be coded on byte level. There is no restriction how much intermittent nodes could be involved into the data delivery, and each of your send() may traverse via different routes. So, you have to formalize the way you send the data, then other party can be sure what is proper way to retrieve them from the socket.

Simplest solution: use a header before your data. So, you plan to send 5000 doubles? Then send a DWORD first, which contains 40000 inside (5k elements, 8 bytes each -> 40k) and push all your 5k doubles right after that. Then, your counterparty should read 4 bytes from the socket first, interpret it as DWORD and understand how much bytes should come then.

Next step: you may want to send not only doubles, but ints and strings as well. That way, you have to expand your header so it can indicate

Total size of further data (so called payload size)
Kind of the data (array of doubles, string, single int etc)

Advanced solution: Take a look on ready-to-go solutions:

ProtoBuf https://developers.google.com/protocol-buffers/docs/cpptutorial
Boost.Serialization https://www.boost.org/doc/libs/1_67_0/libs/serialization/doc/index.html
Apache Thrift https://thrift.apache.org
YAS https://github.com/niXman/yas

Happy coding!

Thank you for your detailed solution ! When sending the header as a DWORD, how can i indicate which type of data will come next? DWORD is defined as an unsigned long, so why read 4 bytes? — Juju, May 28 '19 at 17:27
DWORD is double word, word is 2 bytes, so DWORD has to be 4 bytes (you can use uint32_t with the same result). — Yury Schkatula, May 28 '19 at 21:22
According to the type indication: take a look at such type as Variant (COM Variant or Boost::Variant), usually it takes extra field to store enum value https://learn.microsoft.com/en-us/windows/desktop/api/oaidl/ns-oaidl-tagvariant — Yury Schkatula, May 28 '19 at 21:26

Issue sending/Receiving vector of double over TCP socket (missing data)

2 Answers2