27

This question is related to, but not quite the same as, this question.

Are there any benefits to using std::vector<char> instead of std::string to hold arbitrary binary data, aside from readability-related issues?

i.e. Are there any tasks which are easier/more efficient/better to perform with a vector compared to a string?

Community
  • 1
  • 1
user541686
  • 205,094
  • 128
  • 528
  • 886
  • 2
    In C++03 std::string the contiguity of its data was debatable. – PlasmaHH Jul 06 '12 at 08:53
  • @PlasmaHH: Since there isn't any implementation (that I know of) that is discontiguous, I'd be willing to assume it's contiguous for this question. – user541686 Jul 06 '12 at 08:55

6 Answers6

26

Aside from readability (which should not be underestimated) I can think of a couple of minor performance/memory issues with using std::string over std::vector:

  • Some modern std::string implementations use the small string optimization. If you are storing data that's larger than the string's internal buffer, it becomes a pessimization, reducing the efficiency of copying, moving, and swap1 and increasing the sizeof() for no benefit.

  • An efficient std::string implementation will always allocate at least 1 more byte than the current size for storing a terminating null (not doing so requires extra logic in operator[] to cope with str[size()]).

I should stress that both of these issues are very minor; the performance cost of them will more than likely be lost in the background noise. But you did ask.


1Those operations require branching on size() if the small string optimization is being used, whereas they don't in a good std::vector implementation.

Community
  • 1
  • 1
JoeG
  • 12,994
  • 1
  • 38
  • 63
  • Very interesting point about small strings, though I'm not yet convinced it's a disadvantage. :) Still, a great answer, thanks! +1 – user541686 Jul 06 '12 at 09:32
  • Where did you get the figures that tell that most implementations use small strings? It seems to me that libstdc++ does not use it, and in almost every project I have been involved to in the last decade, I have been using libstdc++ ... – PlasmaHH Jul 06 '12 at 10:06
  • @PlasmaHH: I've changed it to 'some'. – JoeG Jul 06 '12 at 10:12
  • many time you need to pass in null-terminated string to legacy APIs. string has ([`string::c_str()`](http://www.cplusplus.com/reference/string/string/c_str/)) but vector does not. This why you need the extra space also. – Remus Rusanu Jul 06 '12 at 10:45
2

Yes, vector<char> indeed does have more capabilities over string.

Unlike string, vector<char> is guaranteed to preserve iterators, references, etc. during a swap operation. See: May std::vector make use of small buffer optimization?

Community
  • 1
  • 1
user541686
  • 205,094
  • 128
  • 528
  • 886
1

Beyond readability, and ensuring another maintainer does not confuse the purpose of the std::string, there is not a lot of difference in function. You could of course consider char*/malloc as well, if efficiency is the only consideration.

One potential issue I can think of:

std::string defaults to storing <char>. If you later needed to handle another type (e.g. unsigned short) you might need to either:

  • Create your own typedef std::basic_string<unsigned short> (which moves you away from normal std::string handling)
  • Tentatively apply some reinterpret_cast logic in a setter.

With a vector you could simply change the container to a std::vector<unsigned short>.

seanhodges
  • 17,426
  • 15
  • 71
  • 93
  • Could you expand on the last part? What is the disadvantage of using `std::basic_string` compared to `std::vector`? – user541686 Jul 06 '12 at 09:31
  • One disadvantage is that it might not compile. :-) `std::char_traits` is not required by the standard. – Bo Persson Jul 06 '12 at 09:54
  • @Mehrdad your issues would be mainly portability to other platforms and compatibility with other libraries. You are no longer using a traditional std::string, since the standard defines only `char` and `wchar_t` as valid char_traits. Using something else could lead to undefined behaviour if you run a string operation on the contents. – seanhodges Jul 06 '12 at 10:26
1

As other answers mention, a vector could be marginally faster since it guarantees contiguous memory, even for small sizes, and doesn't add an extra null byte at the end. However, it is a lot simpler (code-wise) to concatenate two strings than it is to concatenate two vectors:

Using vector:

vector<char> a, b;
// ...
vector<char> c;
c.insert(c.end(), a.begin(), a.end());
c.insert(c.end(), b.begin(), b.end());

Using string:

string a, b;
// ...
string c = a + b;
Matthew D. Scholefield
  • 2,977
  • 3
  • 31
  • 42
  • The question asks for benefits of `vector` over `string`, not the other way around... weird to see you just quote other answers on that aspect and then post your own response for the reverse direction – user541686 Apr 10 '19 at 08:34
  • Hmm, perhaps this would be better suited for a different question. The reason for my answer was that this was the first result on Google for "vector versus string" so I thought I'd add an answer bringing up something not mentioned. – Matthew D. Scholefield Apr 10 '19 at 09:10
  • Oh I see. Yeah it's unfortunate, since I already had a laundry list of why I'd use `string` over `vector`, so that's specifically not what the question I needed answered. – user541686 Apr 10 '19 at 10:00
0

I think the only benefit you would gain from doing that would the ease of incrementing over the std::vector of characters, but even that can be done with an std::string.

You have to remember that even though std::string seems like an object, it can be accessed like an array, so even accessing specific parts of a string can be done without the use of a std::vector

Nathan White
  • 1,082
  • 7
  • 21
0

Ideally one would use vector<unsigned char> to store arbitrary binary data - but I think you already knew this - as you referred to the old question.

Other than that, using vector would definitely be more memory efficient, as string would add a terminating Nul character. Performance might also improve as the allocation mechanism is different for both - vectors guarantee contiguous memory!

Besides that, using a string would not be correct, as callers/users might inadvertently invoking some of the string methods, which could be a disaster.

go4sri
  • 1,490
  • 2
  • 15
  • 29
  • Would you mind expanding on the last paragraph? What is the 'disaster'? – user541686 Jul 06 '12 at 09:29
  • Consider an example: you have binary data which has multiple nul characters. If a user calls .length(), he would get some answer - which will in all probability be wrong, and he will never be alerted to the fact that it is binary data and not a string. – go4sri Jul 06 '12 at 09:36
  • Why is that wrong? It seems like you're saying it would work correctly, except that it might be unreadable (i.e. misleading). That's fine, but that wasn't the point of my question -- I specifically said issues *except* readability. – user541686 Jul 06 '12 at 09:38
  • @go4sri: Calling `length()` on a string with nul characters should give you the correct length. The problems arise when users start using `c_str()` and then wonder why their strings are truncated. – tinman Jul 06 '12 at 09:40
  • @Mehrdad - I do not think this comes under readability, but if you are not concerned with this kind of error, then you can skip it. – go4sri Jul 06 '12 at 09:43