0

My 3.4GB/s rated SSD reads Numpy arrays of varying sizes at a peak speed of 1.3GB/s, a mere 38%. SSD testing shows it operates at the fully 3.4GB/s. I've had more luck with the previous (now dead) SSD. Arrays are sized 4GB/8GB, application allows changing size.

Problem with memmap is it relies on pagefile, and compression is net-slower in my case. AFAIK Numpy is single-threaded; could parallelism be the solution? If so, any libraries with API-side support?

I've looked at joblib but it lacks any explicit numpy.load examples. I've tried multiprocessing a while back but didn't get far, nor could I find an example anywhere. Note the array is N-dim, where N >= 4, and 'float32' or 'float16'.

Win 10 x64, 24GB RAM, Intel i7-7700HQ 4 core/8 threads, Samsung 970 EVO Plus or Sabrent Rocket.

OverLordGoldDragon
  • 1
  • 9
  • 53
  • 101
  • The reason for this could be that you also allocate memory when calling `np.load`, which takes a significant amount of time. e.g. an example of reading data (compressed) with and without preallocated memory. https://stackoverflow.com/a/56761075/4045774 – max9111 Mar 01 '21 at 09:56
  • @max9111 Your code handles preallocation properly, nevermind on `[:] =`, regardless this is a "workaround" by shifting load burden from reading to decompressing, rather than increasing the raw read speed. -- from deleted: allocation overhead is `10 us` for my 5.9GB float16 array, negligible. – OverLordGoldDragon Mar 01 '21 at 15:02
  • Be careful when measuring allocation speed. You can't measure it for example with `%timeit np.empty(1000_000_000,dtype=np.float16)`. The real allocation is happening at the time when you write for the first time to the whole array. Always benchmark something like only some np.copyto(...) in combination wjth a memory allocation and without and take the difference. There may be also different behavior depending on the OS. I get this behavior at least on windows. – max9111 Mar 01 '21 at 15:13
  • 1
    @max9111 It's definitely more noticeable in `decompress` vs `decompress_pre`, I figured I'm missing something - thanks. Regardless it's a difference of something like 5-10% for float16 in my case. – OverLordGoldDragon Mar 01 '21 at 15:25
  • @max9111 Does [this](https://pastebin.com/pud5mWX2) bench code look good to you? Thanks ahead – OverLordGoldDragon Mar 05 '21 at 20:19
  • Yes but should't it be `out_pre[:,:] = x`? Maybe it would also make sense to compare np.copyto() with np.copy. How are the results? I also looked at the compression example. The influence of relative benefit of preallocation depends on the compression ratio. – max9111 Mar 05 '21 at 21:17
  • @max9111 `[:] == [:, :] == [:, :, ..., :]` in Python; ran `copyto` too, same numbers, seems it's the same ops under the hood. Yes the benefit depends on the time of all other stuff relative to assignment, and allocation time also strongly dependent on `dtype`. -- I can imagine cases where it makes sense to reuse an internal array, via e.g. `reuse = Reuse(); arr = reuse(shape)`, yielding a nice silent speedup. – OverLordGoldDragon Mar 05 '21 at 21:34

0 Answers0