6

When I shrink a numpy array using the resize method (i.e. the array gets smaller due to the resize), is it guaranteed that no copy is made?

Example:

a = np.arange(10)            # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a.resize(5, refcheck=False)  # array([0, 1, 2, 3, 4])

From my understanding this should always be possible without making a copy. My question: Does the implementation indeed guarantee that this is always the case? Unfortunately the documentation of resize says nothing about it.

luator
  • 4,769
  • 3
  • 30
  • 51
  • 1
    If your new size is always going to be smaller why not slice the array and assign back to yourself? – EdChum Sep 04 '15 at 12:19
  • @EdChum: You mean `a = a[:5]`? To be honest I didn't think of this option. I don't know how slicing works internally, though. Will this work without a copy of the data being made somewhere? – luator Sep 04 '15 at 12:26

1 Answers1

4

A numpy array is a fixed size array in the background, any type of resizing will always copy the array.

Having that said, you could create a slice of the array effectively only using a subset of the array without having to resize/copy.

>>> import numpy
>>> a = numpy.arange(10)
>>> b = a[:5]
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b
array([0, 1, 2, 3, 4])
>>>
>>> a += 10
>>> a
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
>>> b
array([10, 11, 12, 13, 14])
>>>
>>> b += 10
>>> a
array([20, 21, 22, 23, 24, 15, 16, 17, 18, 19])
>>> b
array([20, 21, 22, 23, 24])
Wolph
  • 78,177
  • 11
  • 137
  • 148
  • Thanks, I wasn't really aware that I can use slicing for this. This has one caveat, however: The memory for the unused part of `a` is not released (this may be an issue in some cases). What will happen if I do `del a`? Will it keep the full data or is it smart enough to keep only the part that is accessible through `b`? – luator Sep 04 '15 at 13:01
  • 1
    When doing `del a` it will keep `a` completely because it still exists. `b` only displays a small portion of `a` but it's still there. The only way to actually free the memory is by copying it to a new array and deleting the old one. In the background numpy just does a standard `malloc`/`free` when creating/deleting arrays and it's not possible to `free` part of an array in c. – Wolph Sep 04 '15 at 13:05
  • And another question: Why does `resize` have to make a copy when shrinking the array? I am no expert in how memory allocation works but shouldn't it be possible to just release the backmost part of the allocated memory and keeping the front part without copying anything? – luator Sep 04 '15 at 13:05
  • Here's some explanation about the workings in the background: http://stackoverflow.com/questions/2479766/how-allocate-or-free-only-parts-of-an-array – Wolph Sep 04 '15 at 13:06
  • Okay, so if I understand correctly, `realloc` may have the behaviour I am looking for (just shrink the memory block without moving), but it may as well decide to move the block (which needs copying). Thanks for your help :) – luator Sep 04 '15 at 13:17
  • Indeed, although it seems that numpy doesn't currently use realloc for the resize. Having that said, unless you do this very often you won't notice much performance impact. – Wolph Sep 04 '15 at 13:25
  • It's not so much a matter of perfomance than of avoiding memory errors. I am doing machine learning with gigabytes of data having to be kept in memory. Copying the data may just not be possible in some situations due to lack of free memory. For my current problem slicing works fine, though! – luator Sep 04 '15 at 13:35
  • I see. Without knowing more about the use case it's hard to give much meaningful advice but perhaps sparse matrices (`scipy.sparse`) or a combination of multiple arrays would be more beneficial in that case. Also, don't forget to choose the correct data type. `float32` is half the size of `float64` and in many cases enough. – Wolph Sep 04 '15 at 14:52
  • @Wolph The NumPy documentation for `resize` says that a *view* will be returned when possible, but in general it cannot be guaranteed that it won't produce a copy. This means that it's certainly *possible* to resize or truncate without copying. Another option all together would be to use `mmap` arrays, or even raw ctypes arrays. – ely Sep 04 '15 at 14:55
  • 1
    @Mr.F: Correct, but as far as I am aware that only works when simply changing the shape (not the size) of the Array. Resizing from `(4,1)` to `(2, 2)` or `(1, 4)` for example. I should note that I haven't looked at the source too thoroughly, though. The source can be found here for further inspection: https://github.com/numpy/numpy/search?l=c&q=resize&utf8=%E2%9C%93 – Wolph Sep 04 '15 at 15:20