I compared with Linux C regex library,
#include <iostream>
#include <chrono>
#include <regex.h>
int main()
{
const int count = 100000;
regex_t exp;
int rv = regcomp(&exp, R"_(([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?)_", REG_EXTENDED);
if (rv != 0) {
std::cout << "regcomp failed with " << rv << std::endl;
}
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < count; i++)
{
regmatch_t match;
const char *sz = "http://www.abc.com";
if (regexec(&exp, sz, 1, &match, 0) == 0) {
// std::cout << sz << " matches characters " << match.rm_so << " - " << match.rm_eo << std::endl;
} else {
// std::cout << sz << " does not match" << std::endl;
}
}
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << elapsed.count() << std::endl;
return 0;
}
The result is roughly 60-70 milliseconds on my testing machine.
Then I used libc++'s library,
#include <iostream>
#include <chrono>
#include <regex>
int main()
{
const int count = 100000;
std::regex rgx(R"_(([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?)_", std::regex_constants::extended);
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < count; i++)
{
std::cmatch match;
const char sz[] = "http://www.abc.com";
if (regex_search(sz, match, rgx)) {
} else {
}
}
auto end = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << "regex_search: " << elapsed.count() << std::endl;
start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < count; i++)
{
const char sz[] = "http://www.abc.com";
if (regex_match(sz, rgx)) {
} else {
}
}
end = std::chrono::high_resolution_clock::now();
elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << "regex_match: " << elapsed.count() << std::endl;
return 0;
}
The result is roughly 2 seconds for both regex_search & regex_match. This is about 30 times slower than C's regex.h library.
Is there anything wrong with my comparison? Is C++'s regex library not suitable for high performance case?
I can understand it's slow because there's no optimization in c++'s regex library yet, but 30 times slower is just too much.
Thanks.
Hi all,
Thanks for answering.
Sorry for my mistake I was using [] for C too but later I changed, and forgot to change C++ code.
I made two changes,
- I moved const char sz[] out of the loop for both C & C++.
- I compiled it with -O2 (I wasn't using any optimization before), C library's implementation is still around 60 milliseconds, but libc++'s regex now gives a number says, 1 second for regex_search, and 150 milliseconds for regex_match.
This is still a bit slow, but not as much as the original comparison.