4

I have a string like this:

std::string input("I #am going to# learn how #to use #boost# library#");

I do this:

std::vector<std::string> splitVector;
boost::split(splitVector, input, boost::is_any_of("#"));

And got this: (splitVector)

splitVector:
        "I "
        "am going to"
        " learn how " 
        "to use "
        "boos"
        " library"
        "" // **That's odd, why do I have an empty string here ?**

But need something like this:

splitVector:
    "I "
    "#am going to"
    "# learn how "
    "#to use "
    "#boost"
    "# library"
    "#"

How to do that ? Or maybe there is another way to do it in boost library ? And why do I get an empty string in splitVector ?

Romz
  • 1,437
  • 7
  • 36
  • 63
  • Why do you need to keep the delimiters? – kaspersky Feb 18 '14 at 14:20
  • @gg.kaspersky, Good question! In result I have to restore the same string (build it up using splitVector), And I have a problem to detect how many delimiters were in the string, odd or even number, in other words I always restore it as it has even number of delimiters. For instance: if I have the strings "#test" and "#test#" and split it, get first string "test", and the same second string "test", and restore both strings as "#test#" – Romz Feb 18 '14 at 14:32
  • The empty string is there because the input string after the last delimiter is empty. Since your delimiter is a single char, you can simply prepend (or append) the char to result strings. If there were many different delimiters, I don't think boost split has the feature you need. See for example [this](http://stackoverflow.com/questions/1511029/tokenize-a-string-and-include-delimiters-in-c) question for other solutions. – eerorika Feb 18 '14 at 14:34
  • `split` on `"#test#"` should return `{"","test",""}`? – Yakk - Adam Nevraumont Feb 18 '14 at 14:37
  • 1
    @edwin, I am not sure I properly understood your restore-string-use-case, but 1. you can build back the original string using join (http://stackoverflow.com/questions/1833447/a-good-example-for-boostalgorithmjoin) and 2. Number of '#' is equal to splitVector.size() - 1 – kaspersky Feb 18 '14 at 14:37
  • if you only have one type of delimiter, why don't you add the delimiter after you have splitVector. somthing like splitVector[i]= strcat(#,splitVector[i]) – fer y Feb 18 '14 at 14:51

1 Answers1

6

You cannot use boost::split because the internal implementation that uses the split_iterator from boost/algorithm/string/find_iterator.hpp swallows the tokens.

However you can get by with boost::tokenizer, as it has an option to keep the delimiters:

Whenever a delimiter is seen in the input sequence, the current token is finished, and a new token begins. The delimiters in dropped_delims do not show up as tokens in the output whereas the delimiters in kept_delims do show up as tokens.
http://www.boost.org/doc/libs/1_55_0/libs/tokenizer/char_separator.htm

See next live:

#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>

int main() {
    // added consecutive tokens for illustration
    std::string text = "I #am going to# learn how ####to use #boost# library#";    
    boost::char_separator<char> sep("", "#"); // specify only the kept separators
    boost::tokenizer<boost::char_separator<char>> tokens(text, sep);
    for (std::string t : tokens) { std::cout << "[" << t << "]" << std::endl; }
}
/* Output:
[I ]
[#]
[am going to]
[#]
[ learn how ]
[#]
[#]
[#]
[#]
[to use ]
[#]
[boost]
[#]
[ library]
[#] */
mockinterface
  • 14,452
  • 5
  • 28
  • 49