I've been investigating TPL as means of quickly generating a large volume of files - I have about 10 million rows in a database, events which belong to patients, which I want to output into their own text file, in the location d:\EVENTS\PATIENTID\EVENTID.txt
I'm using a two nested Parallel.ForEach loops - the outer in which a list of patients is retrieved and the inner in which the events for a patient are retrieved and written to a file.
This is the code I'm using, it's pretty rough at the moment, as I'm just trying to get things working.
DataSet1TableAdapters.GetPatientsTableAdapter ta = new DataSet1TableAdapters.GetPatientsTableAdapter();
List<DataSet1.GetPatientsRow> Pats = ta.GetData().ToList();
List<DataSet1.GetPatientEventsRow> events = null;
string patientDir = null;
System.IO.DirectoryInfo di = new DirectoryInfo(txtAllEventsPath.Text);
di.GetDirectories().AsParallel().ForAll((f) => f.Delete(true));
//get at the patients
Parallel.ForEach(Pats
, new ParallelOptions() { MaxDegreeOfParallelism = 8 }
, patient =>
{
patientDir = "D:\\Events\\" + patient.patientID.ToString();
//Output directory
Directory.CreateDirectory(patientDir);
events = new DataSet1TableAdapters.GetPatientEventsTableAdapter().GetData(patient.patientID).ToList();
if (Directory.Exists(patientDir))
{
Parallel.ForEach(events.AsEnumerable()
, new ParallelOptions() { MaxDegreeOfParallelism = 8 }
, ev =>
{
List<DataSet1.GetAllEventRow> anEvent =
new DataSet1TableAdapters.GetAllEventTableAdapter();
File.WriteAllText(patientDir + "\\" + ev.EventID.ToString() + ".txt", ev.EventData);
});
}
});
The code I have produced works very quickly but produces an error after a few seconds (in which about 6,000 files are produced). The error produced is one of two types:
DirectoryNotFoundException: Could not find a part of the path 'D:\Events\PATIENTID\EVENTID.txt'.
Whenever this error is produced, the directory structure D:\Events\PATIENTID\ exists, as other files have been created within that directory. An if condition checks for the existence of D:\Events\PATIENTID\ before the second loop is entered.
The process cannot access the file 'D:\Events\PATIENTID\EVENTID.txt' because it is being used by another process.
When this error occurs, sometimes the indicated file exists or doesn't.
So, can anyone of any advice as to why these errors are being produced. I don't understand either, and as far I can see, it should just work (and indeed does, for a short while).