I am running a c++ program in VS. I provided a regex and I am parsing a file which is over 2 million lines long for strings that match that regex. Here is the code:
int main() {
ifstream myfile("file.log");
if (myfile.is_open())
{
int order_count = 0;
regex pat(R"(.*(SOME)(\s)*(TEXT).*)");
for (string line; getline(myfile, line);)
{
smatch matches;
if (regex_search(line, matches, pat)) {
order_count++;
}
}
myfile.close();
cout << order_count;
}
return 0;
}
The file should search for the matched strings and count their occurrences. I have a python version of the program that does this within 4 seconds using the same regex. I have been waiting around 5 minutes for the above c++ code to work and it still hasn't finished. It is not running into an infinite loop because I had it print out its current line number at certain intervals and it is progressing.Is there a different way I should write the above code?
EDIT: This is run in release mode.
EDIT: Here is the python code:
class PythonLogParser:
def __init__(self, filename):
self.filename = filename
def open_file(self):
f = open(self.filename)
return f
def count_stuff(self):
f = self.open_file()
order_pattern = re.compile(r'(.*(SOME)(\s)*(TEXT).*)')
order_count = 0
for line in f:
if order_pattern.match(line) != None:
order_count+=1 # = order_count + 1
print 'Number of Orders (\'ORDER\'): {0}\n'.format(order_count)
f.close()
The program finally stopped running. What's most disconcerting is that the output is incorrect (I know what the correct value should be).
Perhaps using regex for this problem is not the best solution. I will update if I find a solution that works better.
EDIT: Based on the answer by @ecatmur, I made the following changes, and the c++ program ran much faster.
int main() {
ifstream myfile("file.log");
if (myfile.is_open())
{
int order_count = 0;
regex pat(R"(.*(SOME)(\s)*(TEXT).*)");
for (string line; getline(myfile, line);)
{
if (regex_match(line, pat)) {
order_count++;
}
}
myfile.close();
cout << order_count;
}
return 0;
}