-1

I'm new to regular expressions in C++ and was wondering if anyone could tell me what I'm doing wrong here, i'm trying to make a regular expression match a html element, using the code i have i can make it match in all cases except where there is whitespace separating the content from the tags

string opening_tag = "(<[[:alpha:]]+>)";   
string content = "([\\w ]*)";   // zero or more characters or spaces
string closing_tag = "(</[[:alpha:]]+>)";
string html_element = opening_tag + content + closing_tag;

regex r(html_element);

string s;
while (cin >> s)
{
    if (regex_match(s, r))
    {
        cout << "matched" << endl;
    }
}
Etheryte
  • 24,589
  • 11
  • 71
  • 116

1 Answers1

2

Introduction

Your problem is actually not related to the regular-expression itself, but to how you are reading your data.


Explanation

When using operator>> you are effictively reading "word" by "word", since the operator will read as many characters as it can find, until it hits a whitespace (whitespaces are ignored by operator>>).

If you'd like to read an entire line from std::cin and store it in string s, you should use std::getline, as in the below sample snippet:

while (std::getline (std::cin, s)) {
  ...
}

Note: The regular expression constructed in your snippet is legal C++, there are however implementations who don't fully support the usage of character classes such as \w. As an example, if you are using libstdc++ you must replace the usage of \w with the equivalent [_[:alnum:]], making string content = "([_[:alnum:] ]*)".

Community
  • 1
  • 1
Filip Roséen - refp
  • 62,493
  • 20
  • 150
  • 196