C++11 standard has std::regex
. It also included in TR1 for Visual Studio 2010
. Actually TR1 is available since VS2008, it's hidden under std::tr1
namespace. So you don't need Boost.Regex for VS2008 or later.
Splitting can be performed using regex_token_iterator
:
#include <iostream>
#include <string>
#include <regex>
const std::string s("The-meaning-of-life-and-everything");
const std::tr1::regex separator("-");
const std::tr1::sregex_token_iterator endOfSequence;
std::tr1::sregex_token_iterator token(s.begin(), s.end(), separator, -1);
while(token != endOfSequence)
{
std::cout << *token++ << std::endl;
}
if you need to get also the separator itself, you could obtain it from sub_match
object pointed by token
, it is pair containing start and end iterators of token.
while(token != endOfSequence)
{
const std::tr1::sregex_token_iterator::value_type& subMatch = *token;
if(subMatch.first != s.begin())
{
const char sep = *(subMatch.first - 1);
std::cout << "Separator: " << sep << std::endl;
}
std::cout << *token++ << std::endl;
}
This is sample for case when you have single char separator. If separator itself can be any substring you need to do some more complex iterator work and possible store previous token submatch object.
Or you can use regex groups and place separators in first group and the real token in second:
const std::string s("The-meaning-of-life-and-everything");
const std::tr1::regex separatorAndStr("(-*)([^-]*)");
const std::tr1::sregex_token_iterator endOfSequence;
// Separators will be 0th, 2th, 4th... tokens
// Real tokens will be 1th, 3th, 5th... tokens
int subMatches[] = { 1, 2 };
std::tr1::sregex_token_iterator token(s.begin(), s.end(), separatorAndStr, subMatches);
while(token != endOfSequence)
{
std::cout << *token++ << std::endl;
}
Not sure it is 100% correct, but just to illustrate the idea.