I have noticed that the character class [:blank:]
also matches \v
, as demonstrated by the code below. However, that shouldn't be there, per POSIX, should it?
#include <string>
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
using namespace boost;
int main() {
std::string const text{"\v"};
cout << (sregex_token_iterator{text.begin(), text.end(), regex{R"((?-m)^([[:blank:]])$)"}} != sregex_token_iterator{});
cout << (sregex_token_iterator{text.begin(), text.end(), regex{R"((?-m)^([ \t])$)"}} != sregex_token_iterator{}) << '\n';
// output: 10, but I expected 00
return 0;
}
Clearly, since this page of Boost doesn't mention all of the character classes that I see listed here, I suspect that Boost regexes are not POSIX-compliant, even if they use some of those named character classes. Well, not even the word POSIX is at that Boost page, so I guess I'm almost answering myself, but I don't feel confident enough.
I haven't checked which of these character fall into [:blank:]
and/or [:space:]
, but I guess some other suprise might be here too:
const auto LF = "\x0A";
const auto VT = "\x0B";
const auto FF = "\x0C";
const auto CR = "\x0D";
const auto CRLF = "\x0D\x0A";
const auto NEL = "\xC2\x85";
const auto LS = "\xE2\x80\xA8";
const auto PS = "\xE2\x80\xA9";