I have to work through a large file (several MB) and remove comments from it that are marked by a time. An example :
blablabla 12:10:40 I want to remove this
blablabla some more
even more bla
After filtering, I would like it to look like this :
blablabla
blablabla some more
even more bla
The nicest way to do it should be easing a Regex :
Dataout = Regex.Replace(Datain, "[012][0123456789]:[012345][0123456789]:[012345][0123456789].*", string.Empty, RegexOptions.Compiled);
Now this works perfectly for my purposes, but it's a bit slow.. I'm assuming this is because the first two characters [012] and [0123456789] match with a lot of the data (it's an ASCII file containing hexadecimal data, so like "0045ab0123" etc..). So Regex is having a match on the first two characters way too often.
When I change the Regex to
Dataout = Regex.Replace(Datain, ":[012345][0123456789]:[012345][0123456789].*", string.Empty, RegexOptions.Compiled);
It get's an enormous speedup, probably because there's not many ':' in the file at all. Good! But I still need to check the two characters before the first ':' being numbers and then delete the rest of the line.
So my question boils down to :
- how can I make Regex first search for least frequent occurences of ':' and only after having found a match, checking the two characters before that?
Or maybe there's even a better way?