Right way to split an std::string into a vector

Question

What is the right way to split a string into a vector of strings? Delimiter is space or comma.

A split in which commas and spaces are both delimiters, or a function that splits either on space or on comma, according to a parameter? — Steve Jessop, Apr 09 '11 at 20:18
Some of the answers to http://stackoverflow.com/questions/236129/how-to-split-a-string can readily be adapted to work with multiple delimiters. — Gareth McCaughan, Apr 09 '11 at 20:23

score 143 · Answer 1 · edited Nov 01 '16 at 17:25

143

A convenient way would be boost's string algorithms library.

#include <boost/algorithm/string/classification.hpp> // Include boost::for is_any_of
#include <boost/algorithm/string/split.hpp> // Include for boost::split
// ...

std::vector<std::string> words;
std::string s;
boost::split(words, s, boost::is_any_of(", "), boost::token_compress_on);

edited Nov 01 '16 at 17:25

Ogre Psalm33

21,366
16
74
92

answered Apr 09 '11 at 20:24

UncleBens

40,819
6
57
90

What is `token_compress_on` for? – pooya13 Mar 31 '21 at 22:42
@pooya13 From the documentation: If `eCompress` (the fourth argument) is set to `token_compress_on`, adjacent separators are merged together. Otherwise, every two separators delimit a token. https://www.boost.org/doc/libs/1_49_0/doc/html/boost/algorithm/split_id820181.html – Matiboux Apr 03 '21 at 19:36

score 109 · Accepted Answer · edited Jun 20 '20 at 09:12

109

For space separated strings, then you can do this:

std::string s = "What is the right way to split a string into a vector of strings";
std::stringstream ss(s);
std::istream_iterator<std::string> begin(ss);
std::istream_iterator<std::string> end;
std::vector<std::string> vstrings(begin, end);
std::copy(vstrings.begin(), vstrings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));

Output:

What
is
the
right
way
to
split
a
string
into
a
vector
of
strings

string that have both comma and space

struct tokens: std::ctype<char> 
{
    tokens(): std::ctype<char>(get_table()) {}
 
    static std::ctype_base::mask const* get_table()
    {
        typedef std::ctype<char> cctype;
        static const cctype::mask *const_rc= cctype::classic_table();
 
        static cctype::mask rc[cctype::table_size];
        std::memcpy(rc, const_rc, cctype::table_size * sizeof(cctype::mask));
 
        rc[','] = std::ctype_base::space; 
        rc[' '] = std::ctype_base::space; 
        return &rc[0];
    }
};
 
std::string s = "right way, wrong way, correct way";
std::stringstream ss(s);
ss.imbue(std::locale(std::locale(), new tokens()));
std::istream_iterator<std::string> begin(ss);
std::istream_iterator<std::string> end;
std::vector<std::string> vstrings(begin, end);
std::copy(vstrings.begin(), vstrings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));

Output:

right
way
wrong
way
correct
way

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 09 '11 at 20:22

Nawaz

353,942
115
666
851

1

`std::vector vstrings(begin, end);` would be nicer IMO, but I suppose we don't know whether the questioner is constructing the vector, or hoping to populate a pre-existing vector. – Steve Jessop Apr 09 '11 at 20:28
Nice, but wrong. The OP was specific in that both space and comma are delimeters. And you can't do the same trick in this case, can you? – Armen Tsirunyan Apr 09 '11 at 20:32
@Steve: Nice suggestion. @Armen: OP didn't mention anything when I gave the solution. The question doesn't seem to be clear enough. Otherwise there're some elegant ways to deal with both space and comma simultenously: http://stackoverflow.com/questions/4888879/elegant-ways-to-count-the-frequency-of-words-in-a-file – Nawaz Apr 09 '11 at 20:34
I like the use of `istream_iterator` but why not finish strong using `ostream_iterator` as well? – user470379 Apr 09 '11 at 20:37
@Oli, @Steve and @Armen: Please see my second solution. And let me know if it can still be improved. :-) – Nawaz Apr 09 '11 at 20:47
@Nawaz: the obvious possible improvement would be to replace the final `for` loop with a call to `std::copy`. – Jerry Coffin Apr 09 '11 at 20:58
@Jerry: haha.. nice one. forgot that though other times I make use of it as well. thanks for reminding it. :-) – Nawaz Apr 09 '11 at 21:00
1

This is an amazing answer and needs to be highlighted somehow. – Samveen Apr 01 '13 at 11:07
@Samveen: Thanks for the appreciation. :-) – Nawaz Apr 01 '13 at 12:42
2

+1 Very nice. But don't you have to delete the tokens struct created in `ss.imbue(std::locale(std::locale(), new tokens()))` somewhere? – kafman Dec 28 '15 at 14:24
1

@Yes, I thought the read will do that. Use`auto loc = std::make_shared()`, and then pass `ss.imbue(..., loc.get()));`. That should work. – Nawaz Dec 28 '15 at 15:20
@StringerBell The token-facet should be cleaned up by locale's destructor. – jdknight Aug 30 '16 at 23:01

score 34 · Answer 3 · answered Mar 20 '19 at 14:51

34

You can use getline with delimiter:

string s, tmp; 
stringstream ss(s);
vector<string> words;

while(getline(ss, tmp, ',')){
    words.push_back(tmp);
    .....
}

answered Mar 20 '19 at 14:51

James LT

733
2
12
23

score 21 · Answer 4 · edited Jan 04 '19 at 22:06

21

vector<string> split(string str, string token){
    vector<string>result;
    while(str.size()){
        int index = str.find(token);
        if(index!=string::npos){
            result.push_back(str.substr(0,index));
            str = str.substr(index+token.size());
            if(str.size()==0)result.push_back(str);
        }else{
            result.push_back(str);
            str = "";
        }
    }
    return result;
}

split("1,2,3",",") ==> ["1","2","3"]

split("1,2,",",") ==> ["1","2",""]

split("1token2token3","token") ==> ["1","2","3"]

edited Jan 04 '19 at 22:06

Biswajit Roy

508
2
7
19

answered Oct 25 '17 at 23:39

Shiqi Ai

221
2
6

1

I think we can use `break;` instead of `str = "";` as it unnecessarily adds empty string in the result if token is not found. split("1234", ",") ==> ["1234", ""] – Prateek Bhuwania Mar 13 '22 at 07:56

Tod · Answer 5 · 2011-04-23T20:38:38.590

10

If the string has both spaces and commas you can use the string class function

found_index = myString.find_first_of(delims_str, begin_index)

in a loop. Checking for != npos and inserting into a vector. If you prefer old school you can also use C's

strtok()

method.

edited Apr 23 '11 at 20:38

answered Apr 09 '11 at 20:40

Tod

8,192
5
52
93

roach · Answer 6 · 2021-01-29T01:38:48.697

std::vector<std::string> split(std::string text, char delim) {
    std::string line;
    std::vector<std::string> vec;
    std::stringstream ss(text);
    while(std::getline(ss, line, delim)) {
        vec.push_back(line);
    }
    return vec;
}

split("String will be split", ' ') -> {"String", "will", "be", "split"}

split("Hello, how are you?", ',') -> {"Hello", "how are you?"}

EDIT: Here's a thing I made, this can use multi-char delimiters, albeit I'm not 100% sure if it always works:

std::vector<std::string> split(std::string text, std::string delim) {
    std::vector<std::string> vec;
    size_t pos = 0, prevPos = 0;
    while (1) {
        pos = text.find(delim, prevPos);
        if (pos == std::string::npos) {
            vec.push_back(text.substr(prevPos));
            return vec;
        }

        vec.push_back(text.substr(prevPos, pos - prevPos));
        prevPos = pos + delim.length();
    }
}

score 6 · Answer 7 · answered Mar 20 '20 at 23:09

Tweaked version from Techie Delight:

#include <string>
#include <vector>

std::vector<std::string> split(const std::string& str, char delim) {
    std::vector<std::string> strings;
    size_t start;
    size_t end = 0;
    while ((start = str.find_first_not_of(delim, end)) != std::string::npos) {
        end = str.find(delim, start);
        strings.push_back(str.substr(start, end - start));
    }
    return strings;
}

Vitaly Protasov · Answer 8 · 2022-07-15T02:33:28.547

Here is my variant that work somelike as explode function in PHP, we provide given string and delimiters list.

std::vector< std::string > explode(const std::string& data, const std::string& delimiters) {
    auto is_delim = [&](auto & c) { return delimiters.find(c) != std::string::npos; };
    std::vector< std::string > result;
    for (std::string::size_type i(0), len(data.length()), pos(0); i <= len; i++) {
        if (is_delim(data[i]) || i == len) {
            auto tok = data.substr(pos, i - pos);
            if ( !tok.empty() )
                result.push_back( tok );
            pos = i + 1;
        }
    } return result;
}

example of usage

std::string test_delimiters("hello, there is lots of, delimiters, that may be even together,  ");

auto dem_res = explode(test_delimiters, " ,"); // space or comma

for (auto word : dem_res) {
    std::cout << word << '\n';
} std::cout << "end\n";

the ouput:

hello
there
is
lots
of
delimiters
that
may
be
even
together
end

score 0 · Answer 9 · answered Dec 18 '17 at 19:29

i made this custom function that will convert the line to vector

#include <iostream>
#include <vector>
#include <ctime>
#include <string>

using namespace std;

int main(){

    string line;
    getline(cin, line);
    int len = line.length();
    vector<string> subArray;

    for (int j = 0, k = 0; j < len; j++) {
        if (line[j] == ' ') {
            string ch = line.substr(k, j - k);
            k = j+1;
            subArray.push_back(ch);
        }
        if (j == len - 1) {
            string ch = line.substr(k, j - k+1);
            subArray.push_back(ch);
        }
    }

    return 0;
}

newbane2 · Answer 10 · 2021-04-08T12:14:33.660

Here is a modified version of roach's solution that splits based on a string of single character delimiters + supports the option to compress duplicate delimiters.

std::vector<std::string> split(std::string text, std::string delim, bool compress) 
{
    std::vector<std::string> vec;
    size_t pos = 0, prevPos = 0;
    while (1) 
    {
        pos = text.find_first_of(delim, prevPos);

        while(compress) 
        {
            if( prevPos == pos )
                prevPos++;
            else
                break;

            pos = text.find_first_of(delim, prevPos);
        }

        if (pos == std::string::npos) {
            if(prevPos != text.size())
                vec.push_back(text.substr(prevPos));
            return vec;
        }

        vec.push_back(text.substr(prevPos, pos - prevPos));
        prevPos = pos + 1;
    }
}

Example without compress:

std::string s = "  1.2  foo@foo . ";
auto res = split(s, ".@ ", false);
    for(auto i : res)
        std::cout << "string {" << i << "}" << std::endl;

Output:

string {}
string {}
string {1}
string {2}
string {}
string {foo}
string {foo}
string {}
string {}

With compress split(s, ".@ ", true);

string {1}
string {2}
string {foo}
string {foo}

score 0 · Answer 11 · answered Jan 23 '22 at 21:00

Here's a function that will split up a string into a vector but it doesn't include empty strings in the output vector.

vector<string> split(string str, string token) {
    vector<string> result;
    while (str.size()) {
        int index = str.find(token);
        string substr;
        if ((substr = str.substr(0, index)) == "") {
            str = str.substr(index + token.size());
        } else if (index != string::npos) {
            result.push_back(substr);
            str = str.substr(index + token.size());
        } else {
            result.push_back(str);
            str = "";
        }
    }
    return result;
}

Note: The above was adapted from this answer.

score 0 · Answer 12 · edited Mar 19 '22 at 12:48

Usage

void test() {
    string a = "hello : world : ok : fine";
    auto r = split(a, " : ", 2);
    for (auto e: r) {
        cout << e << endl;
    }
}


static inline std::vector<std::string> split(const std::string &str, const std::string &delimiter = " ", const int max_elements = 0) {
    std::vector<std::string> tokens;
    std::string::size_type start_index = 0;
    while (true) {
        std::string::size_type next_index = str.find(delimiter, start_index);
        if (next_index == std::string::npos) {
            tokens.push_back(str.substr(start_index));
            break;
        } else {
            tokens.push_back(str.substr(start_index, next_index - start_index));
            start_index = next_index + delimiter.length();
        }
        if (max_elements > 0 && tokens.size() == max_elements - 1) {
            tokens.push_back(str.substr(start_index));
            break;
        }
    }

    return tokens;
}

While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. — Dharman, Mar 19 '22 at 12:49

Right way to split an std::string into a vector

12 Answers12

string that have both comma and space

Linked

Related