2

I am trying to remove rows or columns from an image represented by a Numpy array. My image is of type uint16 and 2560 x 2176. As an example, say I want to remove the first 16 columns to make it 2560 x 2160.

I'm a MATLAB-to-Numpy convert, and in MATLAB would use something like:

A = rand(2560, 2196);
A(:, 1:16) = [];

As I understand, this deletes the columns in place and saves a lot of time by not copying to a new array.

For Numpy, previous posts have used commands like numpy.delete. However, the documentation is clear that this returns a copy, and so I must reassign the copy to A. This seems like it would waste a lot of time copying.

import numpy as np

A = np.random.rand(2560,2196)
A = np.delete(A, np.r_[:16], 1)

Is this truly as fast as an in-place deletion? I feel I must be missing a better method or not understanding how python handles array storage during deletion.

Relevant previous posts:
Removing rows in NumPy efficiently
Documentation for numpy.delete

Community
  • 1
  • 1
nicktruesdale
  • 815
  • 5
  • 12
  • 1
    Do you have any reference for the fact that "Matlab deletes columns in place, without copying data"? I can't find anything official and from what I've read it has to copy the whole array, which seems reasonable to me. In numpy, you'd use basic slicing (tiago's answer) to avoid a copy. Take into account that this means the whole original array will still be in memory because you get a view. – jorgeca Dec 29 '12 at 13:14

1 Answers1

6

Why not just do a slice? Here I'm removing the first 3000 columns instead of 16 to make the memory usage more clear:

import numpy as np
a = np.empty((5000, 5000)
a = a[:, 3000:]

This effectively reduces the size of the array in memory, as can be seen:

In [31]: a = np.zeros((5000, 5000), dtype='d')
In [32]: whos
Variable   Type       Data/Info
-------------------------------
a          ndarray    5000x5000: 25000000 elems, type `float64`, 200000000 bytes (190 Mb)
In [33]: a = a[:, 3000:]
In [34]: whos
Variable   Type       Data/Info
-------------------------------
a          ndarray    5000x2000: 10000000 elems, type `float64`, 80000000 bytes (76 Mb)

For this size of array a slice seems to be about 10,000x faster than your delete option:

%timeit a=np.empty((5000,5000), dtype='d');  a=np.delete(a, np.r_[:3000], 1)
1 loops, best of 3: 404 ms per loop
%timeit a=np.empty((5000,5000), dtype='d');  a=a[:, 3000:]
10000 loops, best of 3: 39.3 us per loop
tiago
  • 22,602
  • 12
  • 72
  • 88
  • This does seem better. 1) Are you aware if this requires a copy of the slice into new memory? 2) Is delete inherently slower than slicing, or am I using it improperly? – nicktruesdale Dec 29 '12 at 06:41
  • 1
    Slicing does not copy the array into new memory (unlike delete). Assigning the array to a slice of it also doesn't copy it into memory, I think. Internally I think numpy just does the equivalent of assigning `a` to a different pointer in the same memory block. – tiago Dec 29 '12 at 07:02
  • 1
    `np.delete` is slower and will always return a copy, but even fancy indexing (or slicing with a boolean array) is superior before numpy 1.8. (in most cases). If a simple slice is not enough, you need to copy (also in matlab I am sure), but even then you should prefere an indexing array over `np.delete` at this time. – seberg Dec 29 '12 at 12:51
  • @Sticky073 : Normally slicing don't copy data, instead it create a new view. It copies only the header file of numpy array, not full data. So it is faster and memory efficient. – Abid Rahman K Dec 29 '12 at 18:33