0

Say, I'm going to use 30 threads(40 cores CPU) to 4K random read the same file and do some computing.

These 30 threads don't share any objects except for fd.

I initially wanted to open once before starting the threading staff, and then in each thread, pread(I knew the sequence of offsets in advance, and size of the sequence is 64, so there will be 64 pread), pread, pread...

But a shared fd seems to be deficient, because it will be used across 30 threads, hence a great lock-unlock time on pread(it's an atomic operation, so there will be lock-unlock in its implementation).

Is it better to open in each thread?

Bing Zhao
  • 568
  • 6
  • 19
  • 1
    You say "These 30 threads don't share any objects.". Does this mean "I have decided not to share the fd between the threads like one normally would"? – that other guy Apr 28 '20 at 18:02
  • 1
    So, one thread could handle all the reads and hand off buffers to a thread pool, but with I/O, I am very concerned that having many cores won't help much. – Michael Dorgan Apr 28 '20 at 18:15
  • 1
    @thatotherguy I just noticed that `pread` is an atomic operation, so a shared `fd` won't cause any problems. – Bing Zhao Apr 28 '20 at 18:15
  • 2
    Right. The point of `pread` is to let everyone share the same fd without worrying about the current state of the cursor – that other guy Apr 28 '20 at 18:18
  • 4
    The simpler and _faster_ way is to `mmap` the entire file [once]. Then each thread can be given offset pointer and range/length. See my answer: https://stackoverflow.com/questions/60779978/memory-leak-how-do-i-allocate-memory-for-a-typdef-struct-passed-within-another/60780421#60780421 `mmap` in this way will have _much_ better performance than seeking/reading on different file descriptors – Craig Estey Apr 28 '20 at 18:28
  • @CraigEstey The file is bigger than the memory, so `mmap` cannot be used. – Bing Zhao May 05 '20 at 17:16
  • When you say "memory", what are you talking about? Physical RAM?. If you're on a 64 bit machine, you _can_ map a much larger file. You _can_ map the entire file. I know because I've done it, per my linked answer, and other times I've done it. It can slow the system down due to page faults, so I have a version that maps the file in [more] manageable chunks [which I've done on 32 bit machines]. See my other answer: https://stackoverflow.com/a/37173063/5382650 and also my answers that linked in that one. – Craig Estey May 05 '20 at 18:16

0 Answers0