3

One of the highest voted questions under the c++ tag is called "Splitting a string in C++". In it, the questioner asks: "What's the most elegant way to split a string in C++?".

The highest voted answer to this question provides these two functions:

std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {
    std::stringstream ss(s);
    std::string item;
    while (std::getline(ss, item, delim)) {
        elems.push_back(item);
    }
    return elems;
}


std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    return split(s, delim, elems);
}

These functions work great. But I am trying to understand why the answerer didn't combine these two functions into one function. Is there some performance, usability or readability benefit that I am not seeing that you miss out on when you combine these functions? Full program with combined function is below:

#include <iostream>
#include <vector>
#include <string>
#include <sstream>

using namespace std;    
// splitting them into two seperate functions is unnecessary it seems to me, and makes the underlying function harder to understand.
std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    std::stringstream ss(s);
    std::string item;
    while (std::getline(ss, item, delim)) {
        elems.push_back(item);
    }
    return elems;
}

int main()
{
    std::vector<std::string> x = split("one:two::three", ':');
    for (int i = 0; i<x.size(); ++i) cout << x[i] << '\n';
    return 0;
}

I find this function split apart to be exceedingly less elegant - and harder to understand - but I feel like I must be missing something. Why didn't he just combine them?

Community
  • 1
  • 1
Stepan1010
  • 3,136
  • 1
  • 16
  • 21

2 Answers2

4

Imagine that you are going to split a bunch of different sources, but want the results to all end up in one container. For that, you don't want the function to always allocate a new container for you:

Before, requires extra work:

std::vector<std::string> sources = /* populate */;

std::vector<std::string> results;

for (const auto& source : sources)
{
    auto result = split(source, ":");

    // or maintain a vector of vectors...yuck
    results.insert(std::make_move_iterator(result.begin()),
                   std::make_move_iterator(result.end()));
}

After, straightforward:

std::vector<std::string> sources = /* populate */;

std::vector<std::string> results;

for (const auto& source : sources)
{
    split(source, ":", results);
}

Or: imagine you're going to split a bunch of difference sources, and for efficiency you want to minimize memory allocation (your profiler says you allocate too much here, for example). So you reuse the same vector over and over, to avoid subsequent memory allocations after the first split.

Before, slow:

std::vector<std::string> sources = /* populate */;

for (const auto& source : sources)
{
    auto result = split(source, ":");

    process(result);
}

After, better:

std::vector<std::string> sources = /* populate */;

std::vector<std::string> result;
for (const auto& source : sources)
{
    result.clear();
    split(source, ":", result);

    process(result);
}

Of course, the simplicity of having the container created for you is nice in the general case, and we can easily reuse the more general function to create the second function at little cost.

GManNickG
  • 494,350
  • 52
  • 494
  • 543
  • In your view, wouldn't it be nicer if the API were based on iterators? – NPE Apr 01 '13 at 19:03
  • @NPE: Would you want *both* operations to be based on iterators? Or only the inner one? – David Rodríguez - dribeas Apr 01 '13 at 19:10
  • @NPE: In fact, my third edit was going to comment on that. But now it's in the comment section where it belongs. :) Yes, it would perhaps be better style for library code to accept an output iterator. But it just depends on who's using it and who's writing it. – GManNickG Apr 01 '13 at 19:12
  • @GManNickG - I feel like the top function deserves a different or more descriptive name(and more descriptive parameters/arguments). Maybe something like SplitAndMergeWithVector(&inputstring, delimeter, &outputvector). I can defenitely see the benefit of doing it this way though. It is definitely more powerful - but I feel it is still less "elegant" than the combined version - whatever that means. Thanks for your help in understanding its power/benefit. – Stepan1010 Apr 01 '13 at 21:10
3

The code you copied from the original provides two functions with slightly different semantics. The first one adds to an existing vector the result of the splitting, while the second one, built on top of the first one creates a new vector with only the elements from this split.

By providing two functions you can offer (with the same cost) two different behaviors that might match different needs of different users. If you merge them into the single one version, and a user needed to build a list with all the tokens obtained from splitting multiple strings, it would have to create multiple vectors and merge them.

An interesting bit, not really part of the question on the design of one / two functions, is the actual implementation. The second function should be implemented as:

std::vector<std::string> split(const std::string &s, char delim) {
    std::vector<std::string> elems;
    split(s, delim, elems);
    return elems;

}

The difference being that instead of returning the reference obtained from calling split internally, you should return the local variable directly. This change enables Named Return Value Optimization of the local variable, removing the cost of the copy*.

* In C++11 you could also use return std::move(split(s,delim,elems));, to obtain the same behavior, but this requires more keystrokes and is still a move rather than removing the whole operation.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • Thanks for the explanation. Read the first paragraph here - it explains the benefit of having the two seperate functions very well. – Stepan1010 Apr 01 '13 at 21:18