19

I have a series of strings stored in a single array, separated by nulls (for example ['f', 'o', 'o', '\0', 'b', 'a', 'r', '\0'...]), and I need to split this into a std::vector<std::string> or similar.

I could just write a 10-line loop to do this using std::find or strlen (in fact I just did), but I'm wondering if there is a simpler/more elegant way to do it, for example some STL algorithm I've overlooked, which can be coaxed into doing this.

It is a fairly simple task, and it wouldn't surprise me if there's some clever STL trickery that can be applied to make it even simpler.

Any takers?

jalf
  • 243,077
  • 51
  • 345
  • 550
  • some of the answers found here (http://stackoverflow.com/questions/236129/how-to-split-a-string-in-c) can be applied to your problem, be sure to have a look. – João Portela Aug 30 '11 at 13:17
  • This isn't really a duplicate since these strings are null-terminated, and all c-string algorithms apply (calling constructors on raw pointers in the buffer, using `strlen`, etc.) – slaphappy Aug 30 '11 at 13:18
  • Problem: how do you know when to stop? Or do you know the length of the array / have a sentinel (e.g. two consecutive zero chars)? – Konrad Rudolph Aug 30 '11 at 13:22
  • @Konrad: I know where the last string ends. You can assume that it is the end of the buffer, or two consecutive nulls. Both can be arranged trivially. :) – jalf Aug 30 '11 at 13:27

7 Answers7

36

My two cents :

const char* p = str;
std::vector<std::string> vector;

do {
  vector.push_back(std::string(p));
  p += vector.back().size() + 1;
} while ( // whatever condition applies );
slaphappy
  • 6,894
  • 3
  • 34
  • 59
  • 1
    That's really nice! Just the choice of `vector` as a variable name is, IMO, not so clever. – leftaroundabout Aug 30 '11 at 18:47
  • Since the list is double-NUL terminated, it can't contain empty strings (since that would look like the end of the list: ii.e. `"abc\0\0def\0ghi\0"`). So, you can just put the `while` at the top, to correctly deal with an empty list: `while (*p) { ... }` – Scott Smith Feb 02 '21 at 22:06
9

Boost solution:

#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
//input_array must be a Range containing the input.
boost::split(
    strs,
    input_array,
    boost::is_any_of(boost::as_array("\0")));
Mankarse
  • 39,818
  • 11
  • 97
  • 141
  • Yeah, that should work, but relies on Boost. For something as simple as this, I'd prefer a standard-library-only solution – jalf Aug 30 '11 at 13:21
  • 1
    In cases where a pistol would be enough, a bazooka would likely get you both killed. – Lee Louviere Aug 30 '11 at 15:43
  • @Xaade - A lot of projects are using boost already, so it is not a big deal. A solution with similar code complexity that just uses Standard C++ components would obviously be better. (As a side note, this was actually quite difficult to get right, due the requirement for boost::as_array.) – Mankarse Aug 30 '11 at 15:46
  • @Xaade: If a Leopard 2 tank is approaching you with an open hatch, a pistol would be enough, too. But it would require a lot of skill to do it right, the bazooka is more safe. – Sebastian Mach Aug 31 '11 at 07:32
  • @phresnel Obviously a pistol isn't enough if the user isn't skilled. However, this is code, and if the code works every time, then no need for something bigger. – Lee Louviere Sep 06 '11 at 16:32
  • @Xaade: Though this argument also holds for the boost solution. – Sebastian Mach Sep 07 '11 at 05:49
6

The following relies on std::string having an implicit constructor taking a const char*, making the loop a very simple two-liner:

#include <iostream>
#include <string>
#include <vector>

template< std::size_t N >
std::vector<std::string> split_buffer(const char (&buf)[N])
{
    std::vector<std::string> result;

    for(const char* p=buf; p!=buf+sizeof(buf); p+=result.back().size()+1)
        result.push_back(p);

    return result;
}

int main()
{
    std::vector<std::string> test = split_buffer("wrgl\0brgl\0frgl\0srgl\0zrgl");

    for (auto it = test.begin(); it != test.end(); ++it)
        std::cout << '"' << *it << "\"\n";

    return 0;
}

This solution assumes the buffer's size is known and the criterion for the end of the list of strings. If the list is terminated by "\0\0" instead, the condition in the loop needs to be changed from p!=foo+sizeof(foo) to *p.

sbi
  • 219,715
  • 46
  • 258
  • 445
2

Here's the solution I came up with myself, assuming the buffer ends immediately after the last string:

std::vector<std::string> split(const std::vector<char>& buf) {
    auto cur = buf.begin();
    while (cur != buf.end()) {
        auto next = std::find(cur, buf.end(), '\0');
        drives.push_back(std::string(cur, next));
        cur = next + 1;
    }
    return drives;
}
jalf
  • 243,077
  • 51
  • 345
  • 550
2

A more elegant and actual solution (compared to my other answer) uses getline and boils down to 2 lines with only C++2003, and no manual loop bookkeeping and conditioning is required:

#include <iostream>
#include <sstream>
#include <string>

int main() {
    const char foo[] = "meh\0heh\0foo\0bar\0frob";

    std::istringstream ss (std::string(foo, foo + sizeof foo));
    std::string str;

    while (getline (ss, str, '\0'))
        std::cout << str << '\n';
}

However, note how the range based string constructor already indicates an inherent problem with splitting-at-'\0's: You must know the exact size, or find some other char-combo for the Ultimate Terminator.

Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
  • Yep, fortunately I do know the exact size. The data comes from a Microsoft API, so I'm stuck with the null-separated format. :) – jalf Aug 30 '11 at 13:57
  • 1
    I don't like the fact that this initializes and reads from a stream, but I like the simplicity of the loop. I wish we'd find a loop as simple as that that operates directly on the data, rather than copying into a stream. – sbi Aug 30 '11 at 15:56
  • True :) On the other hand, you could see the stream as a (slightly overblown) holder of the state we had to maintain manually. – Sebastian Mach Aug 31 '11 at 07:29
1

A bad answer, actually, but I doubted your claim of a 10 line loop for manual splitting. 4 Lines do it for me:

#include <vector>
#include <iostream>
int main() {
    using std::vector;

    const char foo[] = "meh\0heh\0foo\0bar\0frob";

    vector<vector<char> > strings(1);
    for (const char *it=foo, *end=foo+sizeof(foo); it!=end; ++it) {
        strings.back().push_back(*it);
        if (*it == '\0') strings.push_back(vector<char>());
    }

    std::cout << "number of strings: " << strings.size() << '\n';
    for (vector<vector<char> >::iterator it=strings.begin(), end=strings.end(); 
         it!=end; ++it)
        std::cout << it->data() << '\n';
}
Sebastian Mach
  • 38,570
  • 8
  • 95
  • 130
-9

In C, string.h has this guy:

char * strtok ( char * str, const char * delimiters );

the example on cplusplus.com :

/* strtok example */
#include <stdio.h>
#include <string.h>

int main ()
{
  char str[] ="- This, a sample string.";
  char * pch;
  printf ("Splitting string \"%s\" into tokens:\n",str);
  pch = strtok (str," ,.-");
  while (pch != NULL)
  {
    printf ("%s\n",pch);
    pch = strtok (NULL, " ,.-");
  }
  return 0;
}

It's not C++, but it will work

user916499
  • 85
  • 1
  • 1
  • 7