I am trying to parse a build log file to get some information, using regular expressions. I am trying to use regular expression like ("( {9}time)(.+)(c1xx\\.dll+)(.+)s")
to match a line like time(D:\Program Files\Microsoft Visual Studio 11.0\VC\bin\c1xx.dll)=0.047s
This is taking about 120 s to complete, in a file which has 19,000 lines. some of which are pretty large. Basic problem is when I cut the number of lines to about 19000, using some conditions, it did not changed anything, actually made it worse. I do not understand, if I remove the regular expressions altogether, only scanning the file takes about 6s. That means regular expressions are the main time consuming process here. So why the does not go at least some amount lower when I removed half of the lines.
Also, can anyone tell me what kind of regular expression is faster, more generic one or more specific one. i.e. I can match this line time(D:\Program Files\Microsoft Visual Studio 11.0\VC\bin\c1xx.dll)=0.047s
uniquley in file using this regex also - ("(.+)(c1xx.dll)(.+)")
. But it makes the whole thing to run even slower but when I use something like ("( {9}time)(.+)(c1xx\\.dll+)(.+)")
It makes it run slightly faster.
I am using c++ 11 regex library and mostly regex_match function.
regex c1xx("( {9}time)(.+)(c1xx\\.dll+)(.+)s");
auto start = system_clock::now();
int linecount = 0;
while (getline(inFile, currentLine))
{
if (regex_match(currentLine.c_str(), cppFile))
{
linecount++;
// Do something, just insert it into a vector
}
}
auto end = system_clock::now();
auto elapsed = duration_cast<milliseconds>(end - start);
cout << "Time taken for parsing first log = " << elapsed.count() << " ms" << " lines = " << linecount << endl;
Output:
Time taken for parsing first log = 119416 ms lines = 19617
regex c1xx("( {9}time)(.+)(c1xx\\.dll+)(.+)s");
auto start = system_clock::now();
int linecount = 0;
while (getline(inFile, currentLine))
{
if (currentLine.size() > 200)
{
continue;
}
if (regex_match(currentLine.c_str(), cppFile))
{
linecount++;
// Do something, just insert it into a vector
}
}
auto end = system_clock::now();
auto elapsed = duration_cast<milliseconds>(end - start);
cout << "Time taken for parsing first log = " << elapsed.count() << " ms" << " lines = " << linecount << endl;
Output:
Time taken for parsing first log = 131613 ms lines = 9216
Why its taking more time in the second case ?