I have a list of files (which size can be between 3KB and more than 500MB) which I need to parse.
In order to do it faster, I would like to use Parallel.ForEach instruction to iterate on my list of files.
I know I can use:
Parallel.ForEach(files, new ParallelOptions { MaxDegreeOfParallelism = 2 }, file =>
{
//Do stuff
});
In order to make sure only two files are processed at the same time. However, in the case where the two files are 500MB+, I am getting an out of memory exception.
Do you know if there is a way in C# to limit the ParallelOptions using a boolean. Ideally, I would like to process as many files as possible when the total processed files size is below 1GB (Or wait until processed files are done)
I was also thinking of ordering my list of files (by size) and take the first one with the last one in a Parallel foreach loop (assuming the total size is below 1GB). But once again, I am not sure:
- If this is possible
- What would be the syntax
As far as I understood, the Parallel.ForEach is iterating through the list taking the given order (In that case, it would be impossible to specify how to iterate through my list...)
Any advise on how you would do is appreciated.
Edit1:
Here is the code of how I read my files:
I need to start reading from a specific node: "RootElt" - that's why I don't use File.ReadAllText()
using (XmlReader reader = XmlReader.Create(fi.FullName))
{
reader.ReadToDescendant("RootElt");
return reader.ReadOuterXml();
}
return string.Empty;
NB: I was initially using XDocument and simply do: doc.Load() but this was causing an out of memory exception (even if I process the files one by one) - which is not the case using the XmlReader solution
Once read, I call my deserialize method:
private T Deserialize<T>(string xml)
{
using (TextReader reader = new StringReader(xml))
{
XmlSerializer serializer = new XmlSerializer(typeof(T));
var report = serializer.Deserialize(reader);
return (T)report;
}
}