0

I have a project where I transfer data between client and server using boost.asio sockets. Once one side of the connection receives data, it converts it into a std::vector of std::strings which gets then passed on to the actualy recipient object of the data via previously defined "callback" functions. That way works fine so far, only, I am at this point using methods like atoi() and to_string to convert other data types than strings into a sendable format and back. This method is of course a bit wasteful in terms of network usage (especially when transferring bigger amounts of data than just single ints and floats). Therefore I'd like to serialize and deserialize the data. Since, effectively, any serialisation method will produce a byte array or buffer, it would be convenient for me to just use std::string instead. Is there any disadvantage to doing that? I would not understand why there should be once, since strings should be nothing more than byte arrays.

l'arbre
  • 719
  • 2
  • 10
  • 29
  • 2
    _"Is there any disadvantage to doing that?"_ No. Maybe a `std::vector` might be semantically clearer. – πάντα ῥεῖ May 24 '17 at 18:41
  • `std::string` pretty much has to null-terminate its buffer as far as I can tell, whereas `std::vector` wouldn't have to. Probably not enough of a performance impact to worry about, though, compared to the extra functionality `std::string` makes available. – Daniel Schepler May 24 '17 at 18:41
  • @DanielSchepler I thought `std::string` isn't null terminated, only `string::c_str` and `string::data` gives you a null terminated sequence – Passer By May 25 '17 at 03:28
  • But `string::c_str` is documented to be constant-time at least at cppreference.com, and I don't see how you would achieve that aside from maintaining the string data with a null terminator after it. – Daniel Schepler May 25 '17 at 05:41

4 Answers4

6

In terms of functionality, there's no real difference.

Both for performance reasons and for code clarity reasons, however, I would recommend using std::vector<uint8_t> instead, as it makes it far more clear to anyone maintaining the code that it's a sequence of bytes, not a String.

Xirema
  • 19,889
  • 4
  • 32
  • 68
4

You should use std::string when you work with strings, when you work with binary blob you better work with std::vector<uint8_t>. There many benefits:

  • your intention is clear so code is less error prone

  • you would not pass your binary buffer as a string to function that expects std::string by mistake

  • you can override std::ostream<<() for this type to print blob in proper format (usually hex dump). Very unlikely that you would want to print binary blob as a string.

there could be more. Only benefit of std::string that I can see that you do not need to do typedef.

Slava
  • 43,454
  • 1
  • 47
  • 90
1

You're right. Strings are nothing more than byte arrays. std::string is just a convenient way to manage the buffer array that represents the string. That's it!

There's no disadvantage of using std::string unless you are working on something REALLY REALLY performance critical, like a kernel, for example... then working with std::string would have a considerable overhead. Besides that, feel free to use it.

--

An std::string behind the scenes needs to do a bunch of checks about the state of the string in order to decide if it will use the small-string optimization or not. Today pretty much all compilers implement small-string optimizations. They all use different techniques, but basically it needs to test bitflags that will tell if the string will be constructed in the stack or the heap. This overhead doesn't exist if you straight use char[]. But again, unless you are working on something REALLY critical, like a kernel, you won't notice anything and std::string is much more convenient.

Again, this is just ONE of the things that happens under the hood, just as an example to show the difference of them.

Wagner Patriota
  • 5,494
  • 26
  • 49
  • yes, if you use `std::string` in the kernel level the overhead is very considerable. Here is an example... but there are many more out there: https://stackoverflow.com/questions/21946447/how-much-performance-difference-when-using-string-vs-char-array – Wagner Patriota May 24 '17 at 18:58
  • 1
    @Ðаn I don't personally know the details, but there is a small amount of extra overhead in `std::string` because it has several constraints it needs to conform to, including but not limited to the fact that it needs to always have an extra byte allocated to null-terminate the string. At the same time though, `std::string` objects can be subject to "Small String Optimizations", which can improve the memory footprint. The critical point to take away is that `std::string` can do things under-the-hood that you might not expect. – Xirema May 24 '17 at 18:58
  • @Xirema, about the null-terminate char, both C-String and `std::string` have. So this is not the issue. The overhead is associated with the code necessary to construct and delete the string. For example, it needs to handle the case for small string optimizations, etc... and this make the `std::string` a little heavy! I will update the answer with details. – Wagner Patriota May 24 '17 at 19:01
  • @WagnerPatriota We're not comparing `std::string` and "C-Strings" though, we're comparing `std::string` and `char[]` or `std::vector`. `char[]` and `std::vector` do not allocate and manage the null terminating character automatically; it needs to be manually added by the user (or, more likely, ignored, since no good String use depends on it). – Xirema May 24 '17 at 19:03
  • @Ðаn Well, again, I don't know all the details. I only know that there are various things that affect `std::string` that `std::vector` is happy to ignore, that have impacts on performance. – Xirema May 24 '17 at 19:05
  • reading the question... it is about "a std::vector OF std::strings" AND "disadvantage of using std::string versus byte array". it's straightforward. – Wagner Patriota May 24 '17 at 19:06
-2

Depending on how often you're firing network messages, std::string should be fine. It's a convenience class that handles a lot of char work for you. If you have a lot of data to push though, it might be worth using a char array straight and converting it to bytes, just to minimise the extra overhead std::string has.

Edit: if someone could comment and point out why you think my answer is bad, that'd be great and help me learn too.

forsamori
  • 21
  • 7