I would like to modify the given regular expression to produce the following list of matches. I am having a hard time describing the problem in words.
I want to use a regular expression to match a set of 'tokens'. Specifically I want &&
,||
,;
,(
,)
to be matched, and any string that does not contain those characters should be a match.
The problem I am having is distinguishing between one pipe and two pipes. How can i produce the desired matches? Thank you a lot for your help!
The expression:
((&{2})|(\|{2})|(\()|(\))|(;)|[^&|;()]+)
Test String
a < b | c | d > e >> f && ((g) || h) ; i
Expected Matches
a < b | c | d > e >> f
&&
(
(
g
)
||
h
)
;
i
Actual Matches
a < b
|
c
|
d > e >> f
&&
(
(
g
)
||
h
)
;
i
I am trying to implement a custom tokenizer for a program in C++.
Example Code
std::vector<std::string> Parser::tokenizeInput(std::string s) {
std::vector<std::string> returnTokens;
//tokenize correctly using this regex
std::regex rgx(R"S(((&{2})|(\|{2})|(\()|(\))|(;)|[^&|;()]+))S");
std::regex_iterator<std::string::iterator> rit ( s.begin(), s.end(), rgx );
std::regex_iterator<std::string::iterator> rend;
while (rit!=rend) {
std::string tokenStr = rit->str();
if(tokenStr.size() > 0 && tokenStr != " "){
//assure the token is not blank
//and push the token
boost::algorithm::trim(tokenStr);
returnTokens.push_back(tokenStr);
}
++rit;
}
return returnTokens;
}
Example Driver Code
//in main
std::vector<std::string> testVec = Parser::tokenizeInput(inputWithNoComments);
std::cout << "input string: " << inputWithNoComments << std::endl;
std::cout << "tokenized string[";
for(unsigned int i = 0; i < testVec.size(); i++){
std::cout << testVec[i];
if ( i + 1 < testVec.size() ) { std::cout << ", "; }
}
std::cout << "]" << std::endl;
Produced Output
input string: (cat file > outFile) || ( ls -l | grep -i )
tokenized string[(, cat file > outFile, ), ||, (, ls -l, grep -i, )]
input string: a && b || c > d >> e < f | g
tokenized string[a, &&, b, ||, c > d >> e < f, g]
input string: foo | bar || foo || bar | foo | bar
tokenized string[foo, bar, ||, foo, ||, bar, foo, bar]
What I Want the Output to be
input string: (cat file > outFile) || ( ls -l | grep -i )
tokenized string[(, cat file > outFile, ), ||, (, ls -l | grep -i, )]
input string: a && b || c > d >> e < f | g
tokenized string[a, &&, b, ||, c > d >> e < f | g]
input string: foo | bar || foo || bar | foo | bar
tokenized string[foo | bar, ||, foo, ||, bar | foo | bar]