How to adapt a string splitting algorithm using pointers so it uses iterators instead?

Question

The code below comes from an answer to this question on string splitting. It uses pointers, and a comment on that answer suggested it could be adapted for std::string. How can I use the features of std::string to implement the same algorithm, for example using iterators?

#include <vector>
#include <string>
using namespace std;

vector<string> split(const char *str, char c = ',')
{
    vector<string> result;

    do
    {
        const char *begin = str;

        while(*str != c && *str)
          str++;

        result.push_back(string(begin, str));
    } while (0 != *str++);

    return result;
}

Ok so I obviously replaced char by string but then I noticed he is using a pointer to the beginning of the character. Is that even possible for strings? How do the loop termination criteria change? Is there anything else I need to worry about when making this change?

You could replace `const char *str` by `const std::string& str` and see how far you get. If you get stuck at some point, this would make a good question, but at the moment there is no evidence for any effort made from your side — 463035818_is_not_an_ai, Nov 10 '15 at 11:54
You might want to read about [iterators](http://en.cppreference.com/w/cpp/string/basic_string/begin). — Andrew, Nov 10 '15 at 12:00
I'd attempt to create a `vector` like this: http://stackoverflow.com/a/28880605/2642059 — Jonathan Mee, Nov 10 '15 at 12:04
The real question is "how do I transform this algorithm written in terms of pointers to one in terms of iterators, or another concept that is readily modelled by the `std::string` type. As such, it's not a duplicate of a question about how to split a string; it's just in need of editing. — Andrew, Nov 10 '15 at 12:09
Thanks Andrew, that is exactly what I am looking for, phrasing myself correctly was rather difficult. Will edit the question accordingly. — Kevin Zakka, Nov 10 '15 at 12:11

Andrew · Accepted Answer · 2019-06-21T11:26:32.700

2

You can use iterators instead of pointers. Iterators provide a way to traverse containers, and can usually be thought of as analogous to pointers.

In this case, you can use the begin() member function (or cbegin() if you don't need to modify the elements) of a std::string object to obtain an iterator that references the first character, and the end() (or cend()) member function to obtain an iterator for "one-past-the-end".

For the inner loop, your termination criterion is the same; you want to stop when you hit the delimiter on which you'll be splitting the string. For the outer loop, instead of comparing the character value against '\0', you can compare the iterator against the end iterator you already obtained from the end() member function. The rest of the algorithm is pretty similar; iterators work like pointers in terms of dereference and increment:

std::vector<std::string> split(const std::string& str, const char delim = ',') {
    std::vector<std::string> result;

    auto end = str.cend();
    auto iter = str.cbegin();

    while (iter != end) {
        auto begin = iter;

        while (iter != end && *iter != delim) ++iter;

        result.push_back(std::string(begin, iter));
        if (iter != end) ++iter; // See note (**) below.
    }

    return result;
}

Note the subtle difference in the inner loop condition: it now tests whether we've hit the end before trying to dereference. This is because we can't dereference an iterator that points to the end of a container, so we must check this before trying to dereference. The original algorithm assumes that a null character ends the string, so we're ok to dereference a pointer to that position.

(**) The validity of iter++ != end when iter is already end is under discussion in Are end+1 iterators for std::string allowed? I've added this if statement to the original algorithm to break the loop when iter reaches end in the inner loop. This avoids adding one to an iterator which is already the end iterator, and avoids the potential problem.

edited Jun 21 '19 at 11:26

answered Nov 12 '15 at 11:04

Andrew

5,212
1
22
40

Note: there is a discussion on the validity of `iter++ != end` where `iter` could already be `end`, see http://stackoverflow.com/questions/33657050/are-end1-iterators-for-stdstring-allowed – Matthieu M. Nov 12 '15 at 12:44
@MatthieuM. I've added a check for the case where `iter` reaches `end` within the inner loop. I'd prefer to write this kind of thing as a `for` loop over the range defined by the iterators, ensuring that this `end + 1` scenario cannot happen, but I wanted to keep the code as close as possible to the original for the purposes of answering the question. – Andrew Nov 12 '15 at 13:00
Thanks Andrew, appreciate it. – Kevin Zakka Nov 12 '15 at 13:51
Is there any reason not to write the outer loop as `for (auto iter = str.cbegin(); iter != end; ++iter)`? – Martin Bonner supports Monica Jun 20 '19 at 13:05
The only thing your test at the end of the outer loop does, is increment the iterator - it can never exit the loop because you have already checked that in the preceeding line. – Martin Bonner supports Monica Jun 20 '19 at 13:06
Final comment: adding to `result` with `result.emplace_back(begin, iter)` constructs the final string directly in place. Total is `for (auto iter = str.cbegin(); iter != end; ++iter) { auto begin = iter; while (iter != end && *iter != delim) ++iter; result.emplace_back(begin, iter); }` – Martin Bonner supports Monica Jun 20 '19 at 13:09
@MartinBonner yes, see the comment at the bottom of the post - there was some discussion about the validity of `iter++` when `iter == end`, so I added the `if` to avoid that confusion, but didn't otherwise modify the code. The purpose was to illustrate how to transform the code from the question to use iterators, not to provide a perfectly efficient snippet people can copy & paste without thinking about it! – Andrew Jun 20 '19 at 15:53
@Andrew OK. But avoiding `iter++ != end` when `iter` already equals `end` means you have to change the loop termination. At the very least you should change it to `... if (iter == end) break; iter++ } while (true)`. You may not want to make it perfect, but I don't think you should make it deliberately bad. – Martin Bonner supports Monica Jun 20 '19 at 17:03
@MartinBonner there's nothing deliberate about it. – Andrew Jun 21 '19 at 11:21

How to adapt a string splitting algorithm using pointers so it uses iterators instead?

1 Answers1