C++ std::regex Regular Expressions Finding multiple matches

Question

I am trying to parse a certificate in c++ and decided it was a good opportunity to learn Regex. I just learned about regex an hour or so ago, so excuse my lack of knowledge.

I am looking for all of the OU's associated with an entry.

I am doing the following:

std::smatch OuMatches;
std::string myCertSubject = "O=\"My Company, Incorporated\", OU=Technician Level - A3, OU=Access Level - 1, CN=\"Name, My\", E=namem@company.com";

std::regex subjectRx("OU=[[:w:]|[:s:]|[:digit:]|-]*", std::regex_constants::icase);
bool foundOU = std::regex_search(mySubject,OuMatches,subjectRx);

Why won't this give me all of the results (2) that match my reg ex? Is there a way to get this?

afaik std::regex is not implemented in libstdc++ yet. Use boost::regex — hellow, Sep 13 '13 at 19:04
@progenhard I expect that the two OU=... examples in my string above will match, but only until the next OU, CN, etc. Something like the CN entry above would NOT match, and now that I am thinking about it, nor would any OU containing other punctuation. Like I said I am new to regex, but what I want to do is get all of the OU fields, not including the following entry (OU=, CN=, E=, etc.) — Kyle Preiksa, Sep 13 '13 at 19:04
possible duplicate of [How to match multiple results using std::regex](http://stackoverflow.com/questions/21667295/how-to-match-multiple-results-using-stdregex) — Behrouz.M, May 29 '15 at 08:52

Tharwen · Answer 1 · 2013-09-13T20:45:30.577

6

It looks like you're just trying to get a string that looks like OU=XXXXXXXXXXXXXXXXX followed by a comma or a semicolon.

This regex will do that:

OU=[^,;]+

What this means is the string OU=, followed by at least one character that isn't a comma or semicolon:

[^,;]+

Here's a code sample using this regex to print the matches (based on the example here):

std::smatch OuMatches;
std::string myCertSubject = "O=\"My Company, Incorporated\", OU=Technician Level - A3, OU=Access Level - 1, CN=\"Name, My\", E=namem@company.com";
std::regex subjectRx("OU=[^,;]+", std::regex_constants::icase);

std::regex_iterator<std::string::iterator> it (myCertSubject.begin(), myCertSubject.end(), subjectRx);
std::regex_iterator<std::string::iterator> end;

while (it != end)
{
    std::cout << it->str() << std::endl;
    ++it;
}

edited Sep 13 '13 at 20:45

answered Sep 13 '13 at 19:12

Tharwen

3,057
2
24
36

That is correct, however, my regex seems to work (even if it is overly complicated and non-ideal currently) my issue is with accessing the results. I *will* encounter multiple OU's and I wanted to get them all (ie. for display in a list or something) – Kyle Preiksa Sep 13 '13 at 19:15
This website doesn't think your regex works: http://www.debuggex.com/r/g02uW9S4tvj5La15/0 Are they implemented differently in C++? – Tharwen Sep 13 '13 at 19:20
Great resource, thank you. It at least somewhat worked because I got one result exactly as I expected, the issue is getting the others. I changed my Regex to what you have (actually without the `(?=(,|;))` ) and it still doesn't give me both matches. – Kyle Preiksa Sep 13 '13 at 19:23
Also, @Tharwen, is there a way to say something like the "one character that isn't" for a sequence? So in other words I can say, give me everything up until an "OU=" appears. – Kyle Preiksa Sep 13 '13 at 19:36
`.*(?=(OU=))` should give you everything before an 'OU='. – Tharwen Sep 13 '13 at 19:40
Also, I've added a code sample to my answer, which works on my machine (note that I updated the regex slightly after you suggested removing the lookahead). – Tharwen Sep 13 '13 at 19:42
Your current regex will break in certain situations. See the diagram on [debuggex](http://www.debuggex.com/r/vFNHrbKVTST6XqQ2/0). – Jerry Sep 13 '13 at 20:36
@Jerry Which situations? – Tharwen Sep 13 '13 at 20:39
I thought you'd understand by seeing the diagram... Well, if the `OU` parameter contains brackets, or pipes such as `OU = Technician (Advanced) Level`, you will get only `OU = Technician ` as match. – Jerry Sep 13 '13 at 20:41
@Jerry Oh, right. I assumed those characters were regex syntax in this case. I haven't used them for a while. Thanks though. – Tharwen Sep 13 '13 at 20:44
Yea, they turn into literals when in a character class. Though now I wonder, was there any reason to put `;` there as well? – Jerry Sep 13 '13 at 20:57
@Jerry That's because the entire list ends with a semicolon, not a comma. It's just there in case the last list item starts with OU. – Tharwen Sep 13 '13 at 21:02
Oh? Okay, I didn't see that anywhere :( I guess it doesn't cause any harm otherwise either – Jerry Sep 13 '13 at 21:04

score 4 · Accepted Answer · edited Sep 16 '13 at 07:55

4

Try using a negated character class instead. I get the feeling your character classes aren't behaving like you think they are...

subjectRx("OU=[^,]*", std::regex_constants::icase);

[^,]* will match all characters except a comma.

As for the matches, try using a loop:

while (std::regex_search (mySubject,OuMatches,subjectRx)) {
    // do something
}

I don't know much C++, but I found this documentation page which I think should be a bit more useful.

The piece of code it has here is

while (std::regex_search (s,m,e)) {
    for (auto x:m) std::cout << x << " ";
    std::cout << std::endl;
    s = m.suffix().str();
}

EDIT: I just realise that you can have commas in the parameters like in O=, which won't be working with [^,]. Instead, you can use this regex:

OU=(?:[^,]|,(?!(?:[^"]*"[^"]*"[^"]*)*$))*

You can see an example with O= here.

edited Sep 16 '13 at 07:55

answered Sep 13 '13 at 19:04

Jerry

70,495
13
100
144

That is much simpler and I should have thought of it, but I was building my expression piece by piece using a live preview software, however I am still getting OuMatches.count() == 1 – Kyle Preiksa Sep 13 '13 at 19:12
@KylePreiksa Updated my answer. Sorry if I'm not that familiar with C++ ^^; – Jerry Sep 13 '13 at 19:22
1

duh! I was missing the s=m.suffix().str(); part... That makes sense now. You have to specify that you have already found the first match somehow, and it is done by taking the part *after* your search and using that as your new string to search. Thank you so much. it's a lot to wrap your head around in a day ;) – Kyle Preiksa Sep 13 '13 at 19:33
@KylePreiksa I can't say it makes as much sense to me, but I do understand the logic, phew! You're welcome! I was glad to help ^^ – Jerry Sep 13 '13 at 19:35
@jons34yp Thank you so much for the informative answer. I have learned a lot through this experience. – Kyle Preiksa Sep 16 '13 at 13:17

C++ std::regex Regular Expressions Finding multiple matches

2 Answers2