I have a requirement to process a large 4.5GB CSV file of flight data with 6 million rows.
I have a console app in C# that reads each row, and saves the flight and flight leg data to various tables in our SQL Server database.
Currently I'm taking the basic line-by-line approach, calling a stored procedure to insert the relevant data into the relevant table.
while ((inputLine = csvReader.ReadLine()) != null)
{
// process data line by line and save to database via stored procedure
var flightModel = new FlightModel { FlightNumber = inputLine[0] };
var newFlightId = await executeStoredProc(flightModel);
var flightLegModel = new LegModel { FlightId = newFlightId, Time = inputLine[1] };
await executeStoredProc(flightLegModel);
}
However this is taking about an hour per million rows and I need this to run once a night for maximum one or two hours ideally.
Splitting the file and processing on another server concurrently would be expensive, ideally looking for a quicker way to speed the above process up on one machine.