I need to read different sections of a large file (50-500 GB) from a network-attached storage platform and do some process with it and I need to do this very quickly.
I'm writing applications using .net5, golang and c++ and I publish my code for all platforms(Windows, Linux, MacOS)
The parallel code below works fine when I publish it for Linux and MacOS and I get the benefit of parallel reading(like 4x-32x depends on the number of CPU cores)compared to single thread method.
However with same hardware configuration and same code I don't get any performance effect on Windows machine with parallel method when compared to single thread method.
Another unexpected behavior is that when I write the same logic with GOLANG for Linux platforms,different distros shows different behaviors. For example my code can do parallel reading on ubuntu only if the storage device is mounted with NFS protocol. However with CentOS it can do parallel reading with both configuration(NFS and block storage).
So I'm confused.
- If the problem is the OS then why my code written with GOLANG can do parallel read on NFS and cannot do on block storage when using Ubuntu?
- If the problem is the language(c# or GO), then why C# application can do parallel read on Linux(Ubuntu or CentOS)and cannot do it on Windows(Win Server 2019)?
- If the problem is the protocols that the network storage device is mounted, then how come I can achive parallel read in every scenarion when I use CentOS?
Also con can find the benchmark tools that I've prepared for this scenario below.
I know this question is a very niche one and only interest people who works with network storage devices, but I'll try my change if some OS or Storage or Software people can comment on this. Thanks all.
Single Thread Method in C#
//Store max of each section
int[] maxBuffer = new int[numberOfSections];
using (FileStream streamSource = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous))
{
for (int index = 0; index < numberOfSections; index++)
{
byte[] sectionBuffer = new byte[1024L*20L];
streamSource.Position = (((long)sectionBuffer.Length + numberOfBytesToSkip) * (long)index)%streamSource.Length;
streamSource.Read(sectionBuffer, 0, sectionBuffer.Length));
maxBuffer[index] = sectionBuffer.Max();
}
}
Console.WriteLine(maxBuffer.Sum());
Parallel Method C#
//Store max of each section
int[] maxBuffer = new int[numberOfSections];
Parallel.For(0, numberOfSections, index =>
{
using (FileStream streamSource = new FileStream(filePathOfLargeFile, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous))
{
byte[] sectionBuffer = new byte[1024L*20L];
streamSource.Position = (((long)sectionBuffer.Length + numberOfBytesToSkip) * (long)index)%streamSource.Length;
streamSource.Read(sectionBuffer, 0, sectionBuffer.Length);
maxBuffer[index] = sectionBuffer.Max();
}
});
Console.WriteLine(maxBuffer.Sum());
I'm attaching an image to visualize the implementation of the code above.