1

I have this code to split a string. For some reason, it just sits there doing nothing. I am not sure what the problem is. By the way, delim = ' ' here.

vector<string> split( const string &str, const char &delim )
{
    typedef string::const_iterator iter;

    iter beg = str.begin();

    vector<string> tokens;

    while(beg != str.end())
    {
        iter temp = find(beg, str.end(), delim);
        if(beg != str.end())
            tokens.push_back(string(beg, temp));
        beg = temp;
    }

    return tokens;
}
Svante
  • 50,694
  • 11
  • 78
  • 122
Charles Khunt
  • 2,375
  • 5
  • 25
  • 25
  • 1
    Btw, there's no point testing "beg != str.end()" inside the while loop. You haven't changed beg or str since the top of the loop, where the condition was tested. Also, it's more common to pass a char by value rather than const reference. Neither of these things is the bug, they just make your code simpler and easier to read, without affecting behaviour. – Steve Jessop May 26 '09 at 11:16

10 Answers10

6

Here is another nice and short Boost-based version that uses a whole string as delimiter:

std::vector<std::string> result;
boost::iter_split(result, str, boost::first_finder(delim));

Or case-insensitive:

std::vector<std::string> result;
boost::iter_split(result, str, 
    boost::first_finder(delim, boost::is_iequal()));
mavam
  • 12,242
  • 10
  • 53
  • 87
5

I could debug it for you, I guess but that won't help you in the long run. Here's what you do.

After every line, put a printf() or cout staement dumping the changed variables to standard output. Then run your code, passing a simple set of parameters to it:

vector<string> x = split ("Hello there, Bob.", ' ');

Then, examine the output to see why your implementation isn't working. You'll probably have to break out of the code since, if it's just sitting there, you've probably got yourself one of those new-fangled infinite loops.

Give a man a fish and he'll eat for a day, teach a man to fish, he'll never be hungry again.

Or the Terry Pratchett version:

Give a man some fire and he'll be warm for a day, set a man on fire, he'll be warm for the rest of his life.

Update:

Since you've stated that you've actually done what I suggested, here's what I found out from doing it. It's evident that when you set beg to temp at the end of the while loop, it's pointing at the space. That was discovered by printing the beg string at the top of the while loop - it never changed after the first word was extracted.

Then, when you do the next find, it finds that exact same space rather than first skipping spaces then calling find properly. You need to skip the spaces after each find, making sure you don't iterate beyond the end of the string.

This is my solution. Use it as you wish.

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
using namespace std;

vector<string> split( const string &str, const char &delim ) {
    typedef string::const_iterator iter;
    iter beg = str.begin();
    vector<string> tokens;

    while(beg != str.end()) {
        //cout << ":" << beg._Myptr << ":" << endl;
        iter temp = find(beg, str.end(), delim);
        if(beg != str.end())
            tokens.push_back(string(beg, temp));
        beg = temp;
        while ((beg != str.end()) && (*beg == delim))
            beg++;
    }

    return tokens;
}

int main () {
    vector<string> x = split ("Hello, my name is Bob. ", ' ');
    return 0;
}

Without that space-skipping code at the end of the while loop, the output was:

:Hello, my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :
: my name is Bob. :

and so on, ad infinitum. With the skipping code, you get:

:Hello, my name is Bob. :
:my name is Bob. :
:name is Bob. :
:is Bob. :
:Bob. :
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • yes, that's exactly what i did before i posted, sorry for not stating that .. for some reason, it's in an infinite loop and I think it's because of what Charles Bailey pointed out.. I tried beg = ++temp so that beg points to character after the delimiter but still bombs out .. – Charles Khunt May 26 '09 at 07:20
  • I also like, "Give a man a fish, and he'll eat for a day. Give him a Playstation and he won't bother you for weeks". – Steve Jessop May 26 '09 at 11:10
5

I've got to love Boost, as it's providing a handy solution to this one as well:


std::vector<std::string> Split(const std::string &s, const std::string &d)
{
        std::vector<std::string> v;

        for (boost::split_iterator<std::string::iterator> i = boost::make_split_iterator(s, boost::first_finder(d, boost::is_iequal()));
             i != boost::split_iterator<std::string::iterator>();
             ++i) {
                v.push_back(boost::copy_range<std::string>(*i));
        }

        return v;
}
nhaa123
  • 9,570
  • 11
  • 42
  • 63
3

There is a problem in your while loop in that if the delimiter is found then temp will point to the first delimiter after the first find call.

At the end of the while loop you set beg to the value of temp.

Now beg also points to the first delimiter.

When find is next called it will return the current value of beg again as it does point to a delimiter.

temp hasn't moved on from it's previous value so you are in an infinite loop.

CB Bailey
  • 755,051
  • 104
  • 632
  • 656
2

Maybe this one:

std::vector<std::string> &mysplit(const std::string &s, char delim, std::vector<std::string> &elems) {
    std::stringstream ss(s);
    std::string item;
    while(std::getline(ss, item, delim)) {
        elems.push_back(item);
    }
    return elems;
}
1

find() will return the position X of the next token. When you then assign this to beg and go into the next iteration, it will start searching at position X again - and again - and again ... i.e. you are stuck in an endless loop.

Try this code:

vector<string> split( const string &str, const char &delim )
{
    typedef string::const_iterator iter;

    vector<string> tokens;
    iter pos = str.begin(), last = str.begin();

    while(pos != str.end()) {
        last = pos;
        pos = find(pos, str.end(), delim);

        if (pos != str.end()) {
            string token = string(last, pos);
            if (token.length() > 0)
                tokens.push_back(token);

            last = ++pos;
        }
    }

    string lastToken = string(last, pos);
    if (lastToken.length() > 0)
        tokens.push_back(lastToken);

    return tokens;
}

This has the added benefit that it will include the last token in the list (e.g. when splitting on space, the string "a b c" will now return tokens a, b and c instead of only a and b) and that multiple delims will not lead to empty tokens.

dseifert
  • 1,318
  • 9
  • 22
1
vector<string> split( const string &str, const char &delim )
{
    typedef string::const_iterator iter;

    iter beg = str.begin();

    vector<string> tokens;

    while(beg != str.end())
    {
        iter temp = find(beg, str.end(), delim);
        if(beg != str.end())
            tokens.push_back(string(beg, temp));
        if(temp != str.end())
            temp++;
        beg = temp;
    }

    return tokens;
}
Charles Khunt
  • 2,375
  • 5
  • 25
  • 25
1

You don't have to reinvent the wheel, boost provides a string splitting function for you.
Example code:

string stringtobesplit = "AA/BB-CC")
vector<string> tokens;

boost::split(tokens, stringtobesplit, boost::is_any_of("/-")); 
// tokens now holds 3 items: AA BB CC
hamishmcn
  • 7,843
  • 10
  • 41
  • 46
0

The simplest way to debug this code, is to print all the positions beg will be. If beg doesn't increase then that's your problem.

Peter Stuifzand
  • 5,084
  • 1
  • 23
  • 28
0

Apart from the beg needing to be incremented with the size of the delimiter, one special case is missed: the case where no delimiters are in the string.

xtofl
  • 40,723
  • 12
  • 105
  • 192
  • I think that case works fine - it pushes a single token into the vector, that token being the entire string. – Steve Jessop May 26 '09 at 11:18
  • It only pushes a token if find doesn't return the end. It will return the end if no delimiter is in the string. Or won't it? – xtofl May 26 '09 at 12:24
  • As written, the code always pushes a token (unless the string's empty), because confusingly it tests "beg" against end, not the return value from find. So the test in the loop is always true. – Steve Jessop May 26 '09 at 16:29