What does serialization mean and why does it perform better when used with socket.s

Question

I have a program that communicates with sockets sending and receiving data. Mostly strings of 100 bytes each (char str[100]). The code works and the data transfers are fine through the socket.

Later I read about serialization.

What does serialization mean and why is it advantageous to send / receive data through sockets using this method? Is there any consistent method to serialize any type of data? Given the example of sending char str[100], how can I serialize it?

Yeah, what makes a difference between a serialized struct than just sending the data needed? — user157629, Jan 04 '21 at 15:47
Does this answer your question? [Is Serialization the best for sending data over a socket?](https://stackoverflow.com/questions/9458871/is-serialization-the-best-for-sending-data-over-a-socket) — Rohan Asokan, Jan 04 '21 at 15:51
Some computers organize the 2 bytes of a `short` so that the MSB precedes the LSB (most/least significant bytes) — these are big-endian machines. Others organize it so that the LSB precedes the MSB — these are little-endian machines. If you send the data unserialized between two machines of different endianness, then you get confusion. Things are more complex for bigger integer types. Serialization prevents this confusion. In principle, different systems might have different representations for floating-point types too. — Jonathan Leffler, Jan 04 '21 at 15:52

Clifford · Accepted Answer · 2021-01-04T16:10:26.090

It is possible that the data in str[100] is already serialised - without any semantics defined for its content it is not possible to tell. The question to ask yourself is can any arbitrary receiving system interpret that data correctly regardless of its natural byte order, word-size or floating-point encoding?

It is required for two systems to agree about structure of data exchanged between two systems that may have different representation for specific data types and whose compilers may pack structures with different amounts of padding. If you sent a raw struct you cannot be certain that the receiving system will interpret that structure in the same manner due to possible differences in byte-order, structure packing, word-alignment, word-size, and binary encoding of floating point and even signed data.

Structure packing and alignment directives may deal with the possibility of different packing and alignment, but it does not solve the problem of different byte-order or data type sizes (although using specified width data types may help), or differences in floating-point or signed integer representation.

So to solve all these issues the precise arrangement of bytes and encoding of data must be agreed between systems and that arrangement enforced (by serialisation) despite the "natural" arrangement and architecture of either system.

Specifying and enforcing data offset, byte order, word-width and encoding of each data field and enforcing it in code rather than assuming both systems are the same is one form of serialisation. In other cases you might convert all the data to some other encoding such as comma separated ASCII data or XML which is less efficient but far less ambiguous.

John Bode · Answer 2 · 2021-01-04T16:24:23.760

"Serialization" is the process of formatting data using a common protocol that's agreed upon by both the sender and receiver. The problem with sending raw binary data over a socket is that different systems can interpret the same sequence of bytes very differently (big-endian vs. little-endian, two's complement vs. ones' complement, 16-bit vs. 32-bit words, etc.). Raw binary data sent from an x86 system would be unintelligible to a SPARC system and vice versa.

So the sender has to format the data per the protocol (serialize) before sending it out over the socket, and likewise the receiver has to convert the formatted data to its internal representation (deserialize) upon receipt.

Serialization can apply to non-binary data as well. I work on an online banking application that communicates with a number of back-end processors that use different protocols for communicating account or transaction data (account number, balance, next payment due, etc.). Some use fixed-length text fields, some use variable-length fields with delimiters like commas or tildes, some use XML, some use JSON, etc. The point is both we and the back-end processor agree on the order and formatting of the various items when passing that information back and forth, even though our internal representation of that data is quite different.

If you're just sending individual strings of plain text, then serialization isn't an issue.

What does serialization mean and why does it perform better when used with socket.s

2 Answers2