How do I safely preallocate an integer matrix as an index matrix in numpy

Question

I want to preallocate an integer matrix to store indices generated in iterations. In MATLAB this can be obtained by IXS = zeros(r,c) before for loops, where r and c are number of rows and columns. Thus all indices in subsequent for loops can be assigned into IXS to avoid dynamic assignment. If I accidentally select a 0 in my codes, for example, a wrong way to pick up these indices to select elements from a matrix, error can arise.

But in numpy, 0 or other minus values can also be used as indices. For example, if I preallocate IXS as IXS=np.zeros([r,c],dtype=int) in numpy. In a for loop, submatrix specified by the indices assigned into IXS previously can be obtained by X(:,IXS(IXS~=0)) in MATLAB, but the first row/column may be lost if I perform the selection in the same way in numpy.

Further, in a large program with operations of large matrices, preallocation is important in speeding up the computation, and it is easy to locate the error raised by wrong indexing as 0 may be selected in MATLAB. In numpy, if I select an array by for example X[:,IXS[:n]] with wrong n, no error occurs. I have to pay lots of times to check where the error is. More badly, if the final results are not so strange, I may ignore this bug. This always occurs in my program. Thus I have to debug my codes again and again.

I wonder is there a safe way to preallocate such index matrix in numpy?

You need to add an example or two. You concern isn't obvious. `x=np.zeros([r,c], int)` is a normal array with r rows, c columns and filled with 0s. And integer dtype. What is an `index matrix`? — hpaulj, Jan 08 '17 at 16:02
I'm still puzzled. Show a small example in both MATLAB (I can test in Octave) and numpy. The fact that MATLAB indexing runs from 1:n, means you can use 0 as some sort of 'unassigned' value, where as numpy indexing starts with 0, and it understands `-1` as `end-1`. But usually that's not an issue. There's something unusual about how you create and use this `IXS` array. — hpaulj, Jan 08 '17 at 17:24
Simple example, suppose `IXS=[1,2,4,0,0]` with `x=[1,2,3,4,5]`. Euclidean norm of a subarray, error can arise if I use `norm(x(IXS(1:4)))` in MATLAB, but in numpy no error occurs if I use `np.linalg.norm(x[IXS[:4]])`. Obviously, it is not correct. You are right, if I do not use IXS in a right way or make wrong indices for picking the subarray as in the example. What I want is if this situation appears, error can be raised thus I can easily locate the error. In my program this is important since lots of operations of matrices are performed. — Elkan, Jan 08 '17 at 17:46

score 2 · Answer 1 · answered Jan 08 '17 at 19:03

2

How about filling the index array with values that are obviously too large:

In [156]: x=np.array([1,2,3,4,5])
In [157]: idx=np.full(6,999,dtype=int)
In [158]: idx[:3]=[1,0,4]
In [159]: idx
Out[159]: array([  1,   0,   4, 999, 999, 999])
In [160]: x[idx[:3]]
Out[160]: array([2, 1, 5])
In [161]: x[idx[:4]]
...
IndexError: index 999 is out of bounds for axis 1 with size 5

answered Jan 08 '17 at 19:03

hpaulj

221,503
14
230
353

Thanks. This is how I'm doing in my codes as the size of matrix is fixed during computation. But I don't think this is safe enough since under other circumstances, indices generated by small matrices can be used in a large matrices, so I have to identify how large is large enough. Maybe `np.full(6, np.inf, dtype=int)` is more appropriate, but what should I do if an array has more than 2147483648 elements in future? – Elkan Jan 08 '17 at 19:27
SO question about maximum array dimensions and memory limits: http://stackoverflow.com/q/14525344/901925 – hpaulj Jan 08 '17 at 19:35

score 1 · Answer 2 · answered Jan 08 '17 at 17:02

1

The equivalent of matlb zeros in numpy is numpy.zeros:

Return a new array of given shape and type, filled with zeros.

answered Jan 08 '17 at 17:02

Stephen Rauch

47,830
31
106
135

score 1 · Answer 3 · answered Jan 09 '17 at 08:03

1

If you really really want to catch errors that way, initialize your indices with NaN.

IXS=np.full((r,c),np.nan, dtype=int)

That will always raise an IndexError.

answered Jan 09 '17 at 08:03

Daniel F

13,620
2
29
55

This is same with `np.full((r,c), np.inf, dtype=int)`, as has been suggested in my comments of second answer. In my numpy, both will generate arrays full by `-2147483648`. – Elkan Jan 09 '17 at 08:11
`np.inf==-2147483648` is `True`, `np.nan==-2147483648` is `False` `np.nan` is never equal to anything, not even itself. The large negative value in the error message comes from the data search algorithm, not `np.nan`. In any case, numpy won't let you make arrays that big. – Daniel F Jan 09 '17 at 08:18
Thanks. This can raise problems. Suppose `x = np.full(6,np.inf,dtype=int)`, `x[0]==-2147483648` is `True`, then full `x` with `np.nan`, `x[0]==-2147483648` is also `True`. But both `np.isinf(x)` if x is full by `np.inf` and `np.isnan(x)` if x is full by `np.nan` return `False` arrays. Why can't I use the `is*` functions to identify `nan` and `inf`? If not, I don't think this method is so convenient for my aim. – Elkan Jan 09 '17 at 08:36
Have you tested that? If I do `x=np.full(6,np.nan,dtype=int)`, then `x[0]==-2147483648` is `False`. And `np.isnan(x)` gives `array([ True, True, True, True, True, True], dtype=bool)`. This is as it should be. – Daniel F Jan 09 '17 at 10:12
Yes, I'm using `Spyder 2.3.5.2`, from `python(x,y) 2.7.10.0`, on `Windows 10, 64 bit`. I also tested this in my `IPython (Qt)` from same `python (x,y)`, same results as in `Spyder` are obtained. – Elkan Jan 09 '17 at 10:23
Huh, apparently that changed in Python 3, sorry. I get the same as you in a 2.7 environment. It seems 2.7 converts the `np.nan` to an integer due to the dtype, as just `np.full(6,np.nan)` works as intended - except that the `dype` of `np.nan` is `float64`, not `int`. Even in 3.5. Hmm. – Daniel F Jan 09 '17 at 10:37
See new answer below – Daniel F Jan 09 '17 at 11:04

score 1 · Answer 4 · answered Jan 09 '17 at 11:03

1

Use a numpy.ma.masked_array

IXS=np.ma.masked_values(np.zeros((3,4),dtype=int),0)

masked_array(data =
 [[-- -- -- --]
 [-- -- -- --]
 [-- -- -- --]],
             mask =
 [[ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]],
       fill_value = 0)

Now if you set a value, you can use it as an index:

a=np.arange(10)
IXS[2,2]=5
a[IXS[2,2]]

5

But if you don't:

IXS[0,0]

masked

a[IXS[0,0]]

IndexError: arrays used as indices must be of integer (or boolean) type

answered Jan 09 '17 at 11:03

Daniel F

13,620
2
29
55

If used in this, should we be aware of the warn provided by the comment of the first answer of [this post](http://stackoverflow.com/questions/12708807/numpy-integer-nan)? – Elkan Jan 09 '17 at 12:28
1

You're already implementing numpy using `for` loops, your performance isn't going to be great in any case. Pre-allocation isn't technically needed otherwise. First get your code working then ask another question to optimize it (which will almost certainly be a vectorized solution). – Daniel F Jan 09 '17 at 12:37
Thanks. I will try. – Elkan Jan 09 '17 at 12:44

How do I safely preallocate an integer matrix as an index matrix in numpy

4 Answers4