I'm reading the documentation on std::regex_iterator<std::string::iterator>
since I'm trying to learn how to use it for parsing HTML tags. The example the site gives is
#include <iostream>
#include <string>
#include <regex>
int main ()
{
std::string s ("this subject has a submarine as a subsequence");
std::regex e ("\\b(sub)([^ ]*)"); // matches words beginning by "sub"
std::regex_iterator<std::string::iterator> rit ( s.begin(), s.end(), e );
std::regex_iterator<std::string::iterator> rend;
while (rit!=rend) {
std::cout << rit->str() << std::endl;
++rit;
}
return 0;
}
(http://www.cplusplus.com/reference/regex/regex_iterator/regex_iterator/)
and I have one question about that: If rend
is never initialized, then how is it being used meaningfully in the rit!=rend
?
Also, is the tool I should be using for getting attributes out of HTML tags? What I want to do is take a string like "class='class1 class2' id = 'myId' onclick ='myFunction()' >"
and break in into pairs
("class"
, "class1 class2"
), ("id"
, "myId"
), ("onclick"
, "myFunction()"
)
and then work with them from there. The regular expression I'm planning to use is
([A-Za-z0-9\\-]+)\\s*=\\s*(['\"])(.*?)\\2
and so I plan to iterate through expression of that type while keeping track of whether I'm still in the tag (i.e. whether I've passed a '>'
character). Is it going to be too hard to do this?
Thank you for any guidance you can offer me.