1

Let's say I create 2 numpy arrays, one of which is an empty array and one which is of size 1000x1000 made up of zeros:

import numpy as np;
A1 = np.array([])
A2 = np.zeros([1000,1000])

When I want to change a value in A2, this seems to work fine:

A2[n,m] = 17

The above code would change the value of position [n][m] in A2 to 17.

When I try the above with A1 I get this error:

A1[n,m] = 17

IndexError: index n is out of bounds for axis 0 with size 0

I know why this happens, because there is no defined position [n,m] in A1 and that makes sense, but my question is as follows:

Is there a way to define a dynamic array without that updates the array with new rows and columns if A[n,m] = somevalue is entered when n or m or both are greater than the bound of an Array A?

It doesn't have to be in numpy, any library or method that can update array size would be awesome. If it is a method, I can imagine there being an if loop that checks if [n][m] is out of bounds and does something about it.

I am coming from a MATLAB background where it's easy to do this. I tried to find something about this in the documentation in numpy.array but I've been unsuccessful.

EDIT: I want to know if some way to create a dynamic list is possible at all in Python, not just in the numpy library. It appears from this question that it doesn't work with numpy Creating a dynamic array using numpy in python.

Ariel A
  • 474
  • 4
  • 14
  • 1
    Preferred indexing in `numpy` is `A2[n, m] = 17` – hpaulj Feb 20 '20 at 03:21
  • 1
    Does this answer your question? [Creating a dynamic array using numpy in python](https://stackoverflow.com/questions/46766469/creating-a-dynamic-array-using-numpy-in-python) – Grismar Feb 20 '20 at 03:22
  • Note that I linked the duplicate question, but agree there is no accepted answer there. A fairly good attempt at an answer is given though and it may answer your question entirely. If you feel your question is more specific, please update your own question to set it apart from the more general one. – Grismar Feb 20 '20 at 03:23
  • 1
    I suspect MATLAB implements this with some form of pad or concatenate. As such I expect it is relatively slow. I don't recall learning to do this when I worked with MATLAB years ago, but they've done a lot of work to make things simpler for casual users. – hpaulj Feb 20 '20 at 03:28
  • 1
    Anything you cook up is likely to be a lot slower than NumPy and incompatible with most of the numerical programming tools you're likely to want to use (most of which are built upon NumPy). Python semantics also make this fundamentally less feasible (particularly the reference-oriented variable semantics instead of Matlab's pass-by-value, and NumPy's heavy use of array views, which I don't think Matlab has). You'll probably be a more effective Python programmer if you make your array resizing explicit. – user2357112 Feb 20 '20 at 03:28
  • @Grismar The question you're posting asks what I need (and uses the right word for it), but I was wondering if it could be done in Python in general, not only with numpy (ie with any other library). It also answers that essentially, the padding method (the one I assume MATLAB uses) I'm asking about isn't currently possible on numpy and that does accurately answer that part of the question. – Ariel A Feb 20 '20 at 04:05
  • @hpaulj changed it from `A2[n][m] = 17` to your suggestion. Is there a reason why this is the preferred indexing method? – Ariel A Feb 20 '20 at 04:13
  • 1
    @ArielA: For example, `A2[:, 1]` does what it looks like it should, selecting a single column. `A2[:][1]` doesn't, because it's two separate indexing operations that don't interact the way people writing `A2[:][1]` expect. – user2357112 Feb 20 '20 at 04:32
  • @hpaulj MATLAB implements it by creating a new matrix of the new size, copying all the data to the new matrix, then deleting the old matrix. This all happens behind-the-scenes. – TheBlackCat Mar 05 '20 at 18:49

1 Answers1

4

This can't be done in numpy, and it technically can't be done in MATLAB either. What MATLAB is doing behind-the-scenes is creating an entire new matrix, then copying all the data to the new matrix, then deleting the old matrix. It is not dynamically resizing, that isn't actually possible because of how arrays/matrices work. This is extremely slow, especially for large arrays, which is why MATLAB nowadays warns you not to do it.

Numpy, like MATLAB, cannot resize arrays (actually, unlike MATLAB it technically can, but only if you are lucky so I would advise against trying). But in order to avoid the sort of confusion and slow code this causes in MATLAB, numpy requires that you explicitly make the new array (using np.zeros) then copy the data over.

Python, unlike MATLAB, actually does have a truly resizable data structure: the list. Lists still require there to be enough elements, since this avoids silent indexing errors that are hard to catch in MATLAB, but you can resize an array with very good performance. You can make an effectively n-dimensional list by using nested lists of lists. Then, once the list is done, you can convert it to a numpy array.

TheBlackCat
  • 9,791
  • 3
  • 24
  • 31
  • Thank you for the answer. Do you think that theoretically at least, it would be possible to create a class that is a dynamic array, without forcing a copy over from one array to another (essentially one that is fast). That seems to be what would happen if you convert a list of lists to an array in python with numpy, except that you're still copying the lists to an array anyways. I suppose it won't be necessary if you just stick to lists in general... – Ariel A Mar 04 '20 at 16:24
  • 1
    There already is such a thing, https://pypi.org/project/dynarray/. But you still need to explicitly resize it, again since allowing you to resize by indexing is a great way to get extremely hard-to-catch bugs. The advantage of copying lists to an array is that you only need to do it once. You can resize the lists as many times as you want, such as in a loop, then convert it to a numpy array just once. This is the main use-case I am aware of, since if you know the final size to begin with you can just create a numpy array of the right size from the start. – TheBlackCat Mar 05 '20 at 18:48
  • 1
    I believe `javascript` arrays are dynamic in this sense, but the underlying data structure is more like the Python `dict`. In other words the index is really some sort of `key`. It doesn't fill-in the intermediate values. The fast `numpy` calculations depend on being able to traverse the array with `strides` in simple, fast `C` code, without index lookups and such. – hpaulj Mar 05 '20 at 19:10
  • 1
    @hpaulj Python lists are stored in arrays rather than something like a dict. The reason they are slower is because they are arrays of references to other python objects. This is basically the same as a MATLAB cell array or numpy object array. The reason they are resizable is because Python tracks the size behind-the-scenes and allocates more than is necessary. The reason they are slow is because the processor has to access the underlying Python object rather than being able to directly access the raw data like in numpy. – TheBlackCat Mar 05 '20 at 19:17
  • This was very informative! Thank you for the interesting behind the scenes info. Do you mainly learn this from documentation, or is there a good book you all use(if it’s experience, then hopefully I’ll get to that point too)? – Ariel A Mar 06 '20 at 00:14