We are rewriting our legacy code in C to C++. At the core of our system, we have a TCP client, which is connected to master. Master will be streaming messages continuously. Each socket read will result in say an N number of message of the format - {type, size, data[0]}
.
Now we don't copy these messages into individual buffers - but just pass the pointer the beginning of the message, the length and shared_ptr to the underlying buffer to a workers.
The legacy C version was single threaded and would do an inplace NTOH conversion like below:
struct Message {
uint32_t something1;
uint16_t something2;
};
process (char *message)
Message *m = (message);
m->something1 = htonl(m->something1);
m->something2 = htons(m->something2);
And then use the Message.
There are couple of issues with following the logging in new code.
Since we are dispatching the messages to different workers, each worker doing an ntoh conversion will cause cache miss issues as the messages are not cache aligned - i.e there is no padding b/w the messages.
Same message can be handled by different workers - this is the case where the message needs to processed locally and also relayed to another process. Here the relay worker needs the message in original network order and the local work needs to convert to host order. Obviously as the message is not duplicated both cannot be satisfied.
The solutions that comes to my mind are -
Duplicate the message and send one copy for all relay workers if any. Do the ntoh conversion of all messages belonging to same buffer in the dispatcher itself before dispatching - say by calling a
handler->ntoh(message);
so that the cache miss issue is solved.Send each worker the original copy. Each worker will copy the message to local buffer and then do ntoh conversion and use it. Here each worker can use a thread-specific (thread_local) static buffer as a scratch pad to copy the message.
Now my question is
Is the option 1 way of doing ntoh conversion - C++sy? I mean the alignment requirement of the structure will be different from the char buffer. (we havent had any issue with this yet.). Using scheme 2 should be fine in this case as the scratch buffer can have alignment of max_align_t and hence should typecastable to any structure. But this incur copying the entire message - which can be quite big (say few K size)
Is there a better way to handle the situation?