You mentioned using the multiline regex is taking too long and asked about the state machine approach. So here is some code using a function to perform the operation (note, the function could probably use a little cleaning, but it shows the idea and works faster than the regex). In my testing, using the regex without multiline, I could process 1,000,000 lines (in memory, not writing to a file) in about 34 seconds. Using the state-machine approach it was about 4 seconds.
string RemoveInternalPipe(string line)
{
int count = 0;
var temp = new List<char>(line.Length);
foreach (var c in line)
{
if (c == '\'')
{
++count;
}
if (c == '|' && count % 2 != 0) continue;
temp.Add(c);
}
return new string(temp.ToArray());
};
File.WriteAllLines(@"yourOutputFile",
File.ReadLines(@"yourInputFile").Select(x => RemoveInternalPipe(x)));
To compare the performance against the Regex
version (without the multiline option), you could run this code:
var regex = new Regex(@"(?<=^[^']*'([^']*'[^']*')*[^']*)\|");
File.WriteAllLines(@"yourOutputFile",
File.ReadLines(@"yourInputFile").Select(x => regex.Replace(x, string.Empty));