4

I am read-onlying from a 70GB memmap array, but only using ~300MB from it. Learning from this answer, memmap doesn't actually use physical memory, so I figured I should copy the required array into physical memory for better performance.

However, when I np.copy() a memmap and np.info() the copied array, the class is a memmap. Regardless of this speculation, I see more memory usage and improvement in performance when using a copied array.

Does a copied memmap use physical memory? Or is something else going on behind the scene? Is it that it just looks like I'm using physical memory for the copied array, and my computer is deceiving me like always?

Community
  • 1
  • 1
Jee Seok Yoon
  • 4,716
  • 9
  • 32
  • 47

1 Answers1

7

numpy.memmap is a subclass of numpy.ndarray. memmap does not override the ndarray.copy() method, so the semantics of ndarray.copy() are not touched. A copy into newly-allocated memory is indeed made. For a number of reasons, ndarray.copy() tries to keep the type of the returned object the same when a subclass is used. It makes less sense for numpy.memmap but much more sense for other subclasses like numpy.matrix.

In the case of numpy.memmap, the mmap-specific attributes in the copy are set to None, so the copied array will behave just like a numpy.ndarray except that its type will still be numpy.memmap. Check the ._mmap attribute in both the source and the copy to verify.

Robert Kern
  • 13,118
  • 3
  • 35
  • 32
  • +1 for the answer with proof! `._mmap` of an actual memmap gives me ``, but `._mmap` of a copied array gives me `None`. – Jee Seok Yoon Feb 06 '17 at 06:25
  • Apparently `np.copy` *does not* preserve subclass, while `ndarray.copy` does. Since the OP specified `np.copy`, doesn't this explain why memory consumption went up? – Mike Feb 14 '20 at 20:21