With Linq
you can get your bad rows like this:
using System;
using System.Collections.Generic;
using System.Linq;
public class Program
{
public static void Main()
{
int expectedNumberOfTabs = 5;
List<string> rows = new List<string>
{
"col1 \t col2 \t col3 \t col4 \t col5 \t col6",
"col1 \t col2 \t col3 \t col4 \t col5 \t col6",
"col1 \t col2 \t col3 \t col4",
"col1 \t col2 \t col3 \t col4 \t col5 \t col6 \t col7",
"col1 \t col2 \t col3 \t col4 \t col5 \t col6",
"col1 \t col2 \t col3 \t col4 \t col5",
"col1 \t col2 \t col3 \t col4 \t col5 \t col6",
};
var badRows = rows.Where(row => row.Count(c => c == '\t') != expectedNumberOfTabs);
foreach (var badRow in badRows)
{
// Fix the bad rows
Console.WriteLine(badRow);
}
}
}
Results:
col1 col2 col3 col4
col1 col2 col3 col4 col5 col6 col7
col1 col2 col3 col4 col5
Now I don't expect you to read all 8,000,000+ rows into memory at once. I think you'd read them in one row at a time and deal with them one at a time, so the line from this snippet there you're really interested in is:
row.Count(c => c == '\t') != expectedNumberOfTabs
Which will identify a "bad" row for you to fix.
Sample Approach
Because you're dealing with a massive amount of data you may want to try copying the lines from the file to a new file, fixing bad lines as you run across them. Once you have your new "fixed" file, delete the original file, and then rename the "fixed" file back to your original file and import it into your database.
using System.IO;
using System.Linq;
public class Program
{
public static void Main()
{
int expectedNumberOfTabs = 5;
string originalFile = "MyFile.txt";
string originalFileFixed = "MyFileFixed.txt";
using (StreamReader sr = new StreamReader(originalFile))
using (StreamWriter sw = new StreamWriter(originalFileFixed))
{
string line = sr.ReadLine();
if (line.Count(c => c == '\t') != expectedNumberOfTabs)
{
// line = ...Fix the line
}
sw.WriteLine(line);
}
// Delete original file
File.Delete(originalFile);
// Rename the fixed file back to the original file
File.Move(originalFileFixed, originalFile);
// Import the file
}
}