0

I am working in python and I have encountered a problem: I have to initialize a huge array (21 x 2000 x 4000 matrix) so that I can copy a submatrix on it. The problem is that I want it to be really quick since it is for a real-time application, but when I run numpy.ones((21,2000,4000)), it takes about one minute to create this matrix. When I run numpy.zeros((21,2000,4000)), it is instantaneous, but as soon as I copy the submatrix, it takes one minute, while in the first case the copying part was instantaneous.

Is there a faster way to initialize a huge array?

  • *huge* is your keyword here, it takes a while set 168 million points no matter where you do it, so better just do that only once... – CSᵠ Jun 28 '16 at 09:33
  • I think the difference comes from the fact that `numpy.zeros` seems to create a sparse matrix, which is way lighter than an actual matrix. Therefore when you copy it, it creates the actual matrix, which takes one minute, because it's a huge matrix (168M cells). I don't think you'll have a faster way to initialize a matrix with such dimensions. – ysearka Jun 28 '16 at 09:34

1 Answers1

1

I guess there's not a faster way. The matrix you're building is quite large (8 byte float64 x 21 x 2000 x 4000 = 1.25 GB), and might be using up a large fraction of the physical memory on your system; thus, the one minute that you're waiting might be because the operating system has to page other stuff out to make room. You could check this by watching top or similar (e.g., System Monitor) while you're doing your allocation and watching memory usage and paging.

numpy.zeros seems to be instantaneous when you call it, because memory is allocated lazily by the OS. However, as soon as you try to use it, the OS actually has to fit that data somewhere. See Why the performance difference between numpy.zeros and numpy.zeros_like?

Can you restructure your code so that you only create the submatrices that you were intending to copy, without making the big matrix?

Community
  • 1
  • 1
wildwilhelm
  • 4,809
  • 1
  • 19
  • 24
  • Thank you for your clarification. It might be impossible to not use the huge matrix, which is a big image with 21 channels (it's actually 21 categories with logits/probability for pixel-wise labelling in deep learning). The submatrices are outputs of the neural network, and it depends on the input stride sizes in the layers... – Okabe Kimtaro Jun 28 '16 at 09:48
  • In that case, as @CSᵠ has stated, you might be able to pre-allocate the matrix somewhere, before you enter into the real-time loop of your application. If you can be careful to always re-use the allocated memory, you'll only have to wait the one minute one time. – wildwilhelm Jun 28 '16 at 09:50
  • If I'm right in my assumption that you're allocating `float64`, you could also try changing the `dtype`. Using `float32` would only require 640 MB, which might be quicker ... – wildwilhelm Jun 28 '16 at 09:53
  • It should be the solution indeed. Well I posted here because I was not really sure if it was normal it took so long, or if there was a faster way. Thank you for your explanations :) – Okabe Kimtaro Jun 28 '16 at 09:55