4

I was wondering if there were any situation where a numpy array owning its data is stored non-contiguously.

From a numerical point of view, non-contiguous, row- or column-aligned buffers make sense and are ubiquitous in performance libraries such as IPP. However it seems that numpy by default converts anything passed as an argument of array to a contiguous buffer. This is not really explicitly said in the documentation as far as I understand it.

My question is, does numpy guarantee that any owning array created with np.array is contiguous in memory? More generally, in which situations can we come across a non-contiguous owning array?

EDIT following @Eelco's answer

By non-contiguous, I mean that there is some "empty spaces" in the memory chunk used to store data (strides[1] > shape[0] * itemsize if you will). I do not mean an array whose data is stored using two or more memory allocations — I would be surprised that such an owning numpy array exists. This seems to be consistent with numpy's terminology according to this answer.

By owning arrays, I mean arrays whose .flags.owndata=True. I am not interested in non-owning arrays who can behave wildly indeed.

P-Gn
  • 23,115
  • 9
  • 87
  • 104
  • I could imagine that things like `np.arange(16).reshape(4, 4).transpose()[2]` are non-contiguous or might be optimized on demand. – Alfe Jul 19 '17 at 11:41
  • 1
    reshape, transpose and the indexing return a view; that is, any of those operations would result in a new ndarray object that does not own the memory, but rather refers to the result of arange as its .base attribute – Eelco Hoogendoorn Jul 19 '17 at 12:37

1 Answers1

1

Ive heard it said (no source, sorry), that indeed all memory-owning arrays are contiguous. And that makes sense; how can you own a non-contiguous block? It implies youd have to make an arbitrary number of fragmented deallocation calls when that hypothetical object gets collected... And I think thats not even possible; I think one can only release the ranges originally allocated. And viewed from the other side; ownership originates at the time of allocation; and we can only ever allocate contiguous blocks. (at least thats how it works on the malloc level; you could have a software-based allocation layer on top of that which implements logic to handle such fragmented ownership; but if any such thing exists its news to me).

Ive contributed to jsonpickle to expand its numpy support, and there this question also came up. The code I wrote there would break (and quite horribly so) if someone were to feed it a non-contiguous owning array; and its been more than a year and I havnt seen any issues been reported; so thats fairly strong empirical evidence id say...

But if you are still worried about this leading to hard to track bugs (I dont think there is a limit to the shenanigans a C lib constructing a numpy array can get up to), id recommend simply asserting at runtime that no such frankenarrays ever get accidentally passed in to the wrong places.

Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42
  • I think there is a misunderstanding about what contiguous means. (Maybe from my part.) Your data can be stored non-contiguously and you could still allocate a single block of memory to store it. I *assume* that "contiguous" in the numpy documentation refers to data stored without any "unused" memory space in between data. – P-Gn Jul 19 '17 at 12:42
  • There is no doubt I think that numpy array data is always stored in a single chunk of memory. There is no way you could make use of strides otherwise. – P-Gn Jul 19 '17 at 12:48
  • There is indeed plenty of room for ambiguity here; one notion of contiguous is that the entire buffer of memory held by the array can infact be adressed by some combination of its logical indices. Though there are quite some subtleties to that definition; np.zeros(10)[1:] is not contiguous in that sense; though it is for most intents and purposes. – Eelco Hoogendoorn Jul 19 '17 at 12:54
  • numpy data does not infact always need to be stored in a single allocation; the following is infact a valid numpy array; in pseudocode: a = malloc(10); b=malloc(10); arr = np.ndarray(buffer=a, strides=[1, b-a], shape=[10, 2]). When something is a view, all bets are off; in theory at least. In practice there isnt much reason to worry about such scenarios. – Eelco Hoogendoorn Jul 19 '17 at 12:56
  • This is why I restricted explicitly the scope of my question to owning arrays. – P-Gn Jul 19 '17 at 12:58