0

I don't know who closed this question but please actually read the question... This is a legitimate problem and I have done a good amount of research online and cannot find any way to implement this in C++. I can only assume whoever closed the question did not read it. (They didn't provide any reason for the question being closed so if you are going to close it again please explain why.)

I'm writing a C++ program that will need to take regular expressions that are defined in a XML Schema file and use them to validate XML data. The problem is, the flavor of regular expressions used by XML Schemas does not seem to be directly supported in C++.

For example, there are a couple special character classes \i and \c that are not defined by default and also the XML Schema regex language supports something called "character class subtraction" that does not seem to be supported in C++.

Allowing the use of the \i and \c special character classes is pretty simple, I can just look for "\i" or "\c" in the regular expression and replace them with their expanded versions, but getting character class subtraction to work is a much more daunting problem...

For example, this regular expression that is valid in an XML Schema definition throws an exception in C++ saying it has unbalanced square brackets.

#include <iostream>
#include <regex>

int main()
{
    try
    {
        // Match any lowercase letter that is not a vowel
        std::regex rx("[a-z-[aeiuo]]");
    }
    catch (const std::regex_error& ex)
    {
        std::cout << ex.what() << std::endl;
    }
}

How can I get C++ to recognize character class subtraction within a regex? Or even better, is there a way to just use the XML Schema flavor of regular expressions directly within C++?

tjwrona1992
  • 8,614
  • 8
  • 35
  • 98
  • I think your question is perfectly valid (although the answer is probably that it's not supported). The [regex] tag is valid for this question as well, however, simply adding that tag is liable to get your question closed. Please don't add that tag back. In fact, I would strongly suggest not adding the [regex] tag to any question on SO. – cigien Mar 09 '21 at 03:13
  • I understand that this question is not specifically looking for a syntax error in a regular expression, but it is definitely a question directly about regular expressions so the "regex" tag applies. (unless I have a total misunderstanding of how tags are supposed to be used) – tjwrona1992 Mar 09 '21 at 04:32
  • 1
    Since your original question has been reopened, you can go ahead and close this question as a duplicate. – cigien Mar 11 '21 at 16:54
  • Thanks @cigien :) Now that I put a bounty on the original question I'm getting much closer to figuring out a solid solution. – tjwrona1992 Mar 11 '21 at 20:09

1 Answers1

0

I have never heard of character class subtraction, but if you want any non vowel lowercase letter you can easily enough express that with a regular character class:

std::regex rx("[a-df-hj-np-t-v-z]");
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • The issue is, the string containing the regular expression will be user provided and I need to support user provided regex strings that use character class subtraction because the regex specification for XML Schema definitions requires it... That's why this is so frustrating haha. I would just use an alternative pattern string if I could. I'm starting to think I may need to offload my regular expressions to another language that does have support for this. I think Python might have a regex flavor that supports this. – tjwrona1992 Mar 09 '21 at 01:57