9

I am trying to parse a certificate in c++ and decided it was a good opportunity to learn Regex. I just learned about regex an hour or so ago, so excuse my lack of knowledge.

I am looking for all of the OU's associated with an entry.

I am doing the following:

std::smatch OuMatches;
std::string myCertSubject = "O=\"My Company, Incorporated\", OU=Technician Level - A3, OU=Access Level - 1, CN=\"Name, My\", E=namem@company.com";

std::regex subjectRx("OU=[[:w:]|[:s:]|[:digit:]|-]*", std::regex_constants::icase);
bool foundOU = std::regex_search(mySubject,OuMatches,subjectRx);

Why won't this give me all of the results (2) that match my reg ex? Is there a way to get this?

Kyle Preiksa
  • 516
  • 2
  • 7
  • 23
  • 1
    You should post expected positive and negative matches. – progrenhard Sep 13 '13 at 18:59
  • afaik std::regex is not implemented in libstdc++ yet. Use boost::regex – hellow Sep 13 '13 at 19:04
  • @progenhard I expect that the two OU=... examples in my string above will match, but only until the next OU, CN, etc. Something like the CN entry above would NOT match, and now that I am thinking about it, nor would any OU containing other punctuation. Like I said I am new to regex, but what I want to do is get all of the OU fields, not including the following entry (OU=, CN=, E=, etc.) – Kyle Preiksa Sep 13 '13 at 19:04
  • possible duplicate of [How to match multiple results using std::regex](http://stackoverflow.com/questions/21667295/how-to-match-multiple-results-using-stdregex) – Behrouz.M May 29 '15 at 08:52

2 Answers2

6

It looks like you're just trying to get a string that looks like OU=XXXXXXXXXXXXXXXXX followed by a comma or a semicolon.

This regex will do that:

OU=[^,;]+

What this means is the string OU=, followed by at least one character that isn't a comma or semicolon:

[^,;]+  

Here's a code sample using this regex to print the matches (based on the example here):

std::smatch OuMatches;
std::string myCertSubject = "O=\"My Company, Incorporated\", OU=Technician Level - A3, OU=Access Level - 1, CN=\"Name, My\", E=namem@company.com";
std::regex subjectRx("OU=[^,;]+", std::regex_constants::icase);

std::regex_iterator<std::string::iterator> it (myCertSubject.begin(), myCertSubject.end(), subjectRx);
std::regex_iterator<std::string::iterator> end;

while (it != end)
{
    std::cout << it->str() << std::endl;
    ++it;
}
Tharwen
  • 3,057
  • 2
  • 24
  • 36
  • That is correct, however, my regex seems to work (even if it is overly complicated and non-ideal currently) my issue is with accessing the results. I *will* encounter multiple OU's and I wanted to get them all (ie. for display in a list or something) – Kyle Preiksa Sep 13 '13 at 19:15
  • This website doesn't think your regex works: http://www.debuggex.com/r/g02uW9S4tvj5La15/0 Are they implemented differently in C++? – Tharwen Sep 13 '13 at 19:20
  • Great resource, thank you. It at least somewhat worked because I got one result exactly as I expected, the issue is getting the others. I changed my Regex to what you have (actually without the `(?=(,|;))` ) and it still doesn't give me both matches. – Kyle Preiksa Sep 13 '13 at 19:23
  • Also, @Tharwen, is there a way to say something like the "one character that isn't" for a sequence? So in other words I can say, give me everything up until an "OU=" appears. – Kyle Preiksa Sep 13 '13 at 19:36
  • `.*(?=(OU=))` should give you everything before an 'OU='. – Tharwen Sep 13 '13 at 19:40
  • Also, I've added a code sample to my answer, which works on my machine (note that I updated the regex slightly after you suggested removing the lookahead). – Tharwen Sep 13 '13 at 19:42
  • Your current regex will break in certain situations. See the diagram on [debuggex](http://www.debuggex.com/r/vFNHrbKVTST6XqQ2/0). – Jerry Sep 13 '13 at 20:36
  • @Jerry Which situations? – Tharwen Sep 13 '13 at 20:39
  • I thought you'd understand by seeing the diagram... Well, if the `OU` parameter contains brackets, or pipes such as `OU = Technician (Advanced) Level`, you will get only `OU = Technician ` as match. – Jerry Sep 13 '13 at 20:41
  • @Jerry Oh, right. I assumed those characters were regex syntax in this case. I haven't used them for a while. Thanks though. – Tharwen Sep 13 '13 at 20:44
  • Yea, they turn into literals when in a character class. Though now I wonder, was there any reason to put `;` there as well? – Jerry Sep 13 '13 at 20:57
  • @Jerry That's because the entire list ends with a semicolon, not a comma. It's just there in case the last list item starts with OU. – Tharwen Sep 13 '13 at 21:02
  • Oh? Okay, I didn't see that anywhere :( I guess it doesn't cause any harm otherwise either – Jerry Sep 13 '13 at 21:04
4

Try using a negated character class instead. I get the feeling your character classes aren't behaving like you think they are...

subjectRx("OU=[^,]*", std::regex_constants::icase);

[^,]* will match all characters except a comma.

As for the matches, try using a loop:

while (std::regex_search (mySubject,OuMatches,subjectRx)) {
    // do something
}

I don't know much C++, but I found this documentation page which I think should be a bit more useful.

The piece of code it has here is

while (std::regex_search (s,m,e)) {
    for (auto x:m) std::cout << x << " ";
    std::cout << std::endl;
    s = m.suffix().str();
}

EDIT: I just realise that you can have commas in the parameters like in O=, which won't be working with [^,]. Instead, you can use this regex:

OU=(?:[^,]|,(?!(?:[^"]*"[^"]*"[^"]*)*$))*

You can see an example with O= here.

Jerry
  • 70,495
  • 13
  • 100
  • 144
  • That is much simpler and I should have thought of it, but I was building my expression piece by piece using a live preview software, however I am still getting OuMatches.count() == 1 – Kyle Preiksa Sep 13 '13 at 19:12
  • @KylePreiksa Updated my answer. Sorry if I'm not that familiar with C++ ^^; – Jerry Sep 13 '13 at 19:22
  • 1
    duh! I was missing the s=m.suffix().str(); part... That makes sense now. You have to specify that you have already found the first match somehow, and it is done by taking the part *after* your search and using that as your new string to search. Thank you so much. it's a lot to wrap your head around in a day ;) – Kyle Preiksa Sep 13 '13 at 19:33
  • @KylePreiksa I can't say it makes as much sense to me, but I do understand the logic, phew! You're welcome! I was glad to help ^^ – Jerry Sep 13 '13 at 19:35
  • @jons34yp Thank you so much for the informative answer. I have learned a lot through this experience. – Kyle Preiksa Sep 16 '13 at 13:17