I was recently asked a question in an interview and it really got me thinking.
I am trying to understand and learn more about multithreading, parallelism and concurrency, and performance.
The scenario is that you have a list of file paths. Files are saved on your HDD or on blob storage. You have read the files and store them in a database. How would you do it in the most optimal manner?
The following are some of the ways that I could think of:
The simplest way is to loop through the list and perform this task sequentially.
Foreach(var filePath in filePaths)
{
ProcessFile(filePath);
}
public void ProcessFile(string filePath)
{
var file = readFile(filePath);
storeInDb(file);
}
2nd way I could think of is creating multiple threads perhaps:
Foreach(var filePath in filePaths)
{
Thread t = new Thread(ProcessFIle(filePath));
t.Start();
}
(not sure if the above code is correct.)
3rd way is using async await
List<Tasks> listOfTasks;
Foreach(var filePath in filePaths)
{
var task = ProcessFile(filePath);
listOfTasks.Add(task);
}
Task.WhenAll(listOftasks);
public async void ProcessFile(string filePath)
{
var file = readFile(filePath);
storeInDb(file);
}
4th way is Parallel.For:
Parallel.For(0,filePaths.Count , new ParallelOptions { MaxDegreeOfParallelism = 10 }, i =>
{
ProcessFile(filePaths[i]);
});
What are the differences between them. Which one would be better suited for the job and is there anything better?