Parallel read same file different segments from multiple threads

Question

I need to read different sections of a large file (50-500 GB) from a network-attached storage platform and do some process with it and I need to do this very quickly.

I'm writing applications using .net5, golang and c++ and I publish my code for all platforms(Windows, Linux, MacOS)

The parallel code below works fine when I publish it for Linux and MacOS and I get the benefit of parallel reading(like 4x-32x depends on the number of CPU cores)compared to single thread method.

However with same hardware configuration and same code I don't get any performance effect on Windows machine with parallel method when compared to single thread method.

Another unexpected behavior is that when I write the same logic with GOLANG for Linux platforms,different distros shows different behaviors. For example my code can do parallel reading on ubuntu only if the storage device is mounted with NFS protocol. However with CentOS it can do parallel reading with both configuration(NFS and block storage).

So I'm confused.

If the problem is the OS then why my code written with GOLANG can do parallel read on NFS and cannot do on block storage when using Ubuntu?
If the problem is the language(c# or GO), then why C# application can do parallel read on Linux(Ubuntu or CentOS)and cannot do it on Windows(Win Server 2019)?
If the problem is the protocols that the network storage device is mounted, then how come I can achive parallel read in every scenarion when I use CentOS?

Also con can find the benchmark tools that I've prepared for this scenario below.

storage-benchmark-go

storage-benchmark-csharp

I know this question is a very niche one and only interest people who works with network storage devices, but I'll try my change if some OS or Storage or Software people can comment on this. Thanks all.

Single Thread Method in C#

//Store max of each section
int[] maxBuffer = new int[numberOfSections];

using (FileStream streamSource = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous))
{
    for (int index = 0; index < numberOfSections; index++)
    {
        byte[] sectionBuffer = new byte[1024L*20L];
        streamSource.Position = (((long)sectionBuffer.Length + numberOfBytesToSkip) * (long)index)%streamSource.Length;
        streamSource.Read(sectionBuffer, 0, sectionBuffer.Length));
        maxBuffer[index] = sectionBuffer.Max();
    }
}
Console.WriteLine(maxBuffer.Sum());

Parallel Method C#

//Store max of each section
int[] maxBuffer = new int[numberOfSections];

Parallel.For(0, numberOfSections, index =>
{
    using (FileStream streamSource = new FileStream(filePathOfLargeFile, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.Asynchronous))
    {
        byte[] sectionBuffer = new byte[1024L*20L];
        streamSource.Position = (((long)sectionBuffer.Length + numberOfBytesToSkip) * (long)index)%streamSource.Length;
        streamSource.Read(sectionBuffer, 0, sectionBuffer.Length);
        maxBuffer[index] = sectionBuffer.Max();
    }
});
Console.WriteLine(maxBuffer.Sum());

I'm attaching an image to visualize the implementation of the code above.

I tried it on multiple scenario with multiple storage units, such as local machine physical SSD, virtual server with NFS, virtual server with Fiber Channel Protocol with SAN switch — Muhsin Gurel, Apr 19 '21 at 13:04
Then I guess it's time to start evaluating your fundamental assumptions. What is the *actual* problem you're trying to solve (hint: it's not performance). https://xyproblem.info/ — Robert Harvey, Apr 19 '21 at 14:03
Why did you specify the `FileOptions.Asynchronous` option in the parallel version, and not in the single-thread version? Be aware that the asynchronous filesystem APIs [are not implemented efficiently](https://stackoverflow.com/questions/63217657/why-file-readalllinesasync-blocks-the-ui-thread) in .NET. — Theodor Zoulias, Apr 19 '21 at 14:31
@TheodorZoulias because my point is to show that parallel version doesn't make any differance compared to the single-thread version when compiled for Windows. — Muhsin Gurel, Apr 20 '21 at 08:44
@RobertHarvey you are right. I edited the last part of my post. — Muhsin Gurel, Apr 20 '21 at 08:44
If you want to compare fairly the single-thread vs the parallel version, then all other parameters should stay the same. Now you have used two different mechanisms to access the file system, sync with the single-thread version and async with the parallel version, and any difference in performance can be caused by either of these changed parameters. You can't come to valid conclusions by experimenting this way. Be scientific, and change one thing at a time! — Theodor Zoulias, Apr 20 '21 at 08:47
@TheodorZoulias although using FileOptions.Asynchronous option in single-thread version when only single application is accessing the file doesn't make any differance and I have tried that many times I see your point. People should not be confused with this. I'm editing the code. — Muhsin Gurel, Apr 20 '21 at 09:05
Since you're opening the file in each thread, won't these be competing for access to the file, as in competing for read performance? Wouldn't it make more sense to separate this into one part that reads sections from the file, in a sequential manner, and doles those out to parallel tasks that processes them? I don't believe parallel access to a file is going to a good idea. In any case, you need to profile your code to figure out where the bottleneck is, everything else is just guesswork. — Lasse V. Karlsen, Apr 20 '21 at 09:13
Now the comparison is fair. I would suggest to leave the `FileOptions.Asynchronous` completely out of the equation, because its use is atypical. This option is intended for enabling the asynchronous filesystem APIs, like the `Stream.ReadAsync` method, that are rarely used because their performance is awful (this may change in the future though). — Theodor Zoulias, Apr 20 '21 at 09:50
When you say "same hardware configuration", do you mean that you used the exact same physical storage with all measurements (Windows/Linux/MacOS)? AFAIK the solid state drives behave quite well when reading different portions of the same file in parallel, but the classic hard disk drives do not. — Theodor Zoulias, Apr 20 '21 at 09:56
@LasseV.Karlsen my bottleneck is definitely file read operations. I'm adding the results that measures the ticks for each operation Ticks for finding max in total: 5762261 Ticks for reading in total: 161620592 — Muhsin Gurel, Apr 20 '21 at 09:58
@TheodorZoulias I'm using the same network storage unit in our server room with two different virtual machine(both in same physical rack) also in our server room. Both machines have same amount of resources dedicated and have same network configurations etc. The only difference is one have windows server 2019 and the other have Ubuntu.18.4. I also compared two identical company laptops with ssd's both have same specs, again one is windows10 the other is ubuntu.18.4 results are pointing the same thing. For some reason parallel file read doesn't work on Windows. — Muhsin Gurel, Apr 20 '21 at 10:09
Could you repeat the experiment using a less ambitious parallelization option, like `MaxDegreeOfParallelism = 2`, and with a larger buffer size, like `32768`? I am proposing it because I have seen some (modest) performance improvements while doing my own parallel-reading experiments on a Windows machine with SSD. Also be aware that the `Parallel.For` mechanism is constrained by the `ThreadPool` availability, so to make the `MaxDegreeOfParallelism = 128` actually work as intended you must also do this: `ThreadPool.SetMinThreads(128, 128);` — Theodor Zoulias, Apr 20 '21 at 10:23
@TheodorZoulias thanks for pointing out `ThreadPool.SetMinThreads(128, 128);` . I tried with various buffer sizes and got slight improvements. Also anything lower than 1024 decrease my performance However the main issue remains, the same code runs on linux machine almost 16 times faster with parallel version.(I have 16 cores on that machine). But for some reason windows doesn't show any improvement on parallel version. — Muhsin Gurel, Apr 21 '21 at 11:46
It sounds quite marvelous that reading a file in parallel can be 16 times faster than reading it sequentially (your code doesn't seem to do much more than reading the file). So much that I am inclined to question the validity of your measurements. But I won't do it. So I'll accept that, based on your observations, Windows s*cks and Linux rules! — Theodor Zoulias, Apr 21 '21 at 12:09
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/231419/discussion-between-muhsin-gurel-and-theodor-zoulias). — Muhsin Gurel, Apr 21 '21 at 12:38

Parallel read same file different segments from multiple threads

0 Answers0