4

The below regex statement matches when using perl, but it doesn't match when using c++. After reading the "std::regex" class information on cplusplus.com, I might have to use regex_search instead. That's unless, I use the flags within the regex_match. Using the regex_search seems to over complicate a simple match I want to perform. I would like to have the match on 1 line similar to perl. Is there another 1 line approach for performing regex matches in c++?

c++

std::string line1 = "interface GigabitEthernet0/0/0/3.50 l2transport";
if (std::regex_match(line1, std::regex("/^(?=.*\binterface\b)(?=.*\bl2transport\b)(?!.*\.100)(?!.*\.200)(?!.*\.300)(?!.*\.400).*$/")))
cout << line1;

perl

my $line1 = "interface GigabitEthernet0/0/0/3.50 l2transport";
if ($line1 =~ /^(?=.*\binterface\b)(?=.*\bl2transport\b)(?!.*\.100)(?!.*\.200)(?!.*\.300)(?!.*\.400).*$/ )
    print $line1;

I could create a method and pass the search criteria to return a true or false....

(Note: the reason i want to use C++ is because its much faster) interpreted vs compiled

(Update 2017-05-16: C++ wasn't faster than Perl in this example. The speed of your script just depends on how you arrange your code in either language. In my case, Perl was actually faster than C++ in this scenario. Both languages used regex and the same type of layouts. C++ seemed extremely slower even when I used the boost library.)

Jelani
  • 705
  • 6
  • 25
  • "_(Note: the reason i want to use C++ is because its much faster) interpreted vs compiled_" Are you sure about that? Have you profiled it, or are you just guessing? Odds are the library you're calling via perl is implemented in c / c++ – Disillusioned Nov 29 '16 at 03:00
  • 2
    I'm not certain, but I suspect your C++ string literals might need to escape the `"\"` character as use `"\\"` to mean `"\"` – Disillusioned Nov 29 '16 at 03:01
  • 2
    Do you need to double escape (`\\b`) in C++? `\b` might be treated as the backspace character. – mob Nov 29 '16 at 03:01
  • Positive about that, i'm looping through thousands of lines within a text file and its taking too long. I've recreated the same script in Java and C# and perl is just too slow. Its only because Java, C#, and C++ are compiled before execution. – Jelani Nov 29 '16 at 03:02
  • That regexp could be sped up in perl using 3 match expressions. Just an observation. `if (/\binterface\b/ && /\bl2transport\b/ && !/\.[1-4]00/)` – Chris Charley Nov 29 '16 at 18:08
  • interesting...i'll test this out. I'm basically comparing two text files with 70K lines. Then, comparing each line by line using inner & outer loops. I eventually changed that to using outer loop and map + grep. This improved the speed, but I believe c++ will be even faster because its precompiled. – Jelani Nov 30 '16 at 02:56
  • I have to say that I am having terrible trouble with your last paragraph ("_Update 2017-05-16_"). It's part incorrect, part unclear, part misleading. Of course, on any one operation one can write bad and slow code in any one language, and for a very specific task it's hard to tell what's faster in general. And regex in Perl runs in compiled code. But what do you mean, when you compare languages in general, by "_isn't faster_" ? Try with just a simple (long enough) loop. Try other things. It's a good question otherwise, imo. – zdim May 17 '17 at 22:40
  • Your last two sentence is my exact point, and its the point I was trying to make in my update. The speed of your script just depends on how you arrange or optimize your code, and its not necessarily determined if you use C++ over Perl. Using Perl, I looped through a billion lines of data and it took approx 4 hours. To compare the speeds, I created a new C++ project with the similar libraries, code, and layout, but it actually took 8 hours to loop through that data. I thought by using C++, I could loop through that data much faster. – Jelani May 18 '17 at 01:13
  • I learned that I can achieve impressive speeds through optimization. In my example, I just changed the Perl script from using Regex to Hash Maps, profiling the code, removed several subroutines, and changed my code layout. There are plenty of blogs that compare the speeds between programming languages, but by using ALMOST any language, you can achieve amazing speeds through optimization versus using a different language. – Jelani May 18 '17 at 01:21
  • No, no, no, hang on. By mentioning "a simple loop" I meant to emphasize an inherent difference in speed. In a complex project, an _optimal_ use of Perl or C++ is generally going to be very different. (Well, most of the time even Perl and Python will need a different approach.) I would be absolutely astonished if a problem so demanding that it takes 4 hours for Perl could not be solved far, far, faster with C++. But, a suitable approach will be _very_ different. – zdim May 18 '17 at 06:18
  • Yes, the speed depends on "_how you arrange your code_" ... and what algorithms, approaches, and techniques you use. And the "best" ones usually differ across languagues. I can see some (tiny) chance that a scripting language will beat C++ only in a very, very narrow problem, where there happen to be a highly optimized library that does practically everything (and there is no such library for C++, while of course I can't hand-roll code competitve with the library). So when the scripting language is used more or less as merely a wrapper. – zdim May 18 '17 at 06:21
  • Btw, I wonder what kind of processing is needed that takes 4 hours to go through a billion lines. That much is read in a few minutes. I don't mean to assault you here, just to offer comments. That's great if you found your solution :). – zdim May 18 '17 at 06:24
  • I'm taking data from 4 different systems and matching between 5 different data types. If street address 1 matches, but name or height doesn't match, place in bucket 1 and so on. I'm also collecting and building the data set from two of the systems. Originally, I used Java since the program was so massive, but I wanted to test the speed of the other languages. Actually, now that I recall, Java has been the fastest without any optimization. But as i mentioned earlier, I could have easily increased the speed in any of the languages through optimization without choosing a different language – Jelani May 18 '17 at 13:37
  • OK, thanks for explaining it. The C++ can _blow through_ such a thing (in comparison), but that involves use of higher-level constructs -- you'd want to come up with suitable algorithms that will allow you to leverage STL, and you'd want to make good use of I/O streams. In a project of that complexity it _must_ be possible to write a compiled program that will be incomparably faster than a script -- your instinct is right there. But it will be considerably harder and far more time-consuming to write it, too. So you got Perl code to do it well enough -- that's great :). – zdim May 18 '17 at 22:53
  • Btw, when you mean to contact specific users use "@username" so that they get notified. The "owner" of the post (question or answer) always does, you don't need "@". If there is only one other user (than yourself) in an exchange then they also get notified (without an "at"), I think. – zdim May 18 '17 at 22:57
  • @zdim Moving forward with my career, I want to choose one scripting language (Perl) and two compiled language (C# and Java) to master. But, when faced with massive data mining projects, I've read that C++ was the language to use. I might revisit C++ to learn the complexities, so that I can leverage its speed. Because, if C++ can blow through data mining projects faster than other languages, then I might speed more time to learn how to optimize C++ code, and add that to my toolkit – Jelani May 19 '17 at 01:51
  • @shoother Good to see articulate thoughts on languages and toolkits. I can't say what's better career wise, but C++ is the industry workhorse and standard when it comes to speed (for all I know). But it is very large and complex and there is a lot to learn there. (I like it a lot, partly for its complexities.) I've been told by the old and wise that most of what one learns in computing comes useful at some point. That's certainly the case with a language like C++. It does take time, though, and how to invest that is the big question. Perhaps do some research on languages for your field? – zdim May 19 '17 at 04:06

3 Answers3

5

As others have mentioned, you don't need the two forward slashes when you write regex in C++. I will also add another solution which is that you can use a Raw String literal to write the regex.

For example:

std::string line1 = "interface GigabitEthernet0/0/0/3.50 l2transport";
std::regex pattern (R"(^(?=.*\binterface\b)(?=.*\bl2transport\b)(?!.*\.100)(?!.*\.200)(?!.*\.300)(?!.*\.400).*$)");
if (std::regex_match(line1, pattern)) {
    std::cout << line1 << '\n';
}

By using a raw string, you prevent C++ from interpreting the escaped characters inside the string therefore your regex is left intact.

smac89
  • 39,374
  • 15
  • 132
  • 179
4

@craig young is correct. The regex "\" character in c++ requires double slash "\"

And, when using c++ its not necessary to have the outer "/" surrounding the statement. I used the below code to make it match...thanks

if (std::regex_match(line1, regex("^(?=.*\\binterface\\b)(?=.*\\bl2transport\\b)(?!.*\\.100)(?!.*\\.200)(?!.*\\.300)(?!.*\\.400).*$")))
Jelani
  • 705
  • 6
  • 25
  • Saying the `/` aren't necessary, implies they will be ignored. Is that the case, or will the the engine will try to match a `/`? – ikegami Nov 29 '16 at 03:35
3

Problem 1

To produce the string ...\b..., one needs to use the string literal "...\\b...". Just like you'd use $s =~ "...\\b..." in Perl, you need to use regex("...\\b...") in C++.

Problem 2

The / aren't actually part of the pattern. (In Perl, that's one of the operators that accepts a regular expression pattern.) As such, they have no business being used here unless you want to match a /.

Fixed and Simplified

regex("^(?=.*\\binterface\\b)(?=.*\\bl2transport\\b)(?!.*\\.[1-4]00)")
ikegami
  • 367,544
  • 15
  • 269
  • 518