2

From timing the creation of Nx4096x4096 arrays, it appears Numpy does it much faster when N = 2 or 3 than N = 1:

import numpy as np
%timeit a = np.zeros((2, 4096, 4096), dtype=np.float32, order='C')
5.24 µs ± 98.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit a = np.zeros((4096, 4096), dtype=np.float32, order='C')
23.4 ms ± 401 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

The difference is shocking. Why is that so and how to make the case when N = 1 at least as fast as when N > 1? Could the "%timeit" be simply wrong for timing this?

Context: I need to create another single array of 4096 x 4096 with a different type (uint8), and I'm trying to get the fastest Pythonic (or Numpy-related) implementation. The Nx4096x4096 array wil be populated with non-zeros values from a 3-column array (read from a file) where the 1st column are 1D coordinates and 2nd and 3rd column are the intensity values for the 1st and 2nd image (hence the N=2 case). Using sparse matrix is for now not an option. There are 130 million of such files. So the above is happening as many times.

[EDIT] This is under Python 3.6.4, numpy 1.14 under macOS Sierra. Same version under Windows do not reproduce the same behavior. The np.zeros() for the smaller array take half the time than the twice-larger array. From the comments and the mentionned duplicate question I understand this can be due to thresholds in memory allocations. This does however defeat the purpose of %timeit.

[EDIT 2] Regarding the duplicate question, the question here should be now more about how to time this function properly, without having to write extra code that will access the variable so the OS actually allocates the memory. Wouldn't that extra code bias the result of the timing? Isn't there a simple way to profile this?

Wall-E
  • 623
  • 5
  • 17
  • I'm not able to reproduce this, so it might be machine and/or numpy version dependent. I'm using python 2.6.6 and numpy version 1.4.1 on CentOS 6 and I get 22 ms for the first one and 11 ms for the second one. It might help if you can update the question with your numpy and python version and what OS you're using. – user545424 Mar 19 '18 at 21:47
  • If you're planning to open 130 million files and create a 2x4096x4096 array for each one, that's over 4 petabytes of array and a lot of disk access. Even if you don't need to store all those arrays in memory at once, this is going to be crazy slow. – user2357112 Mar 19 '18 at 22:00
  • I've added software versions. @user2357112 yes, we are aware this wil be crazy slow (a few months at the moment), hence our attempt at understanding all possible bottlenecks. This will of course happen in a parallel way in a more proper infrastructure. – Wall-E Mar 19 '18 at 23:18
  • 1
    Another test here. I have a timing between 10 and 11 microseconds for both cases. Debian, Python 3.6.4, NumPy 1.13.3. If *allocation time* is specifically annoying you, you can reuse an existing array and overwrite its content. If you need all data open at once, you might need to rethink your strategy. There is work toward distributed array processing in Python, have a look at [dask](https://dask.pydata.org/en/latest/) or [Matthew Rocklin's blog](https://matthewrocklin.com/blog/work/2018/01/22/pangeo-2) – Pierre de Buyl Mar 20 '18 at 11:43
  • I couldn't reproduce this behaiviour, but if you do this test one after another the garbage collector could do some harm here. Also compare the speed to pure Memory allocation (np.empty). But I think with this amount of data, high I/O speed and fast data compression strategies are your main problem. – max9111 Mar 22 '18 at 09:33

0 Answers0