Efficient usage of tensors in python-numpy

Question

I require an m-dimensional np.ndarray lattice structure, denoted by arr, with the following properties where m and n are constants (e.g. m=3,n=50):

arr.shape == (n, n, n, ..., n) where n in range(100)
len(arr.shape) == m where m in range(4)
so up to 100,000,000 lattice points

Is it better to store this as a 1D array and overload __getitem__ and __setitem__ or is numpy optimised in terms of memory storage for large arrays?

Your "for n in range(100)" and "for m in range(4)" are unclear. One array certainly can't have all of those shapes, and it's not clear whether you want the `n`s in `(n, n, n, ..., n)` to be the same, increasing, or what. — user2357112, Jun 27 '16 at 20:24
They are clear if you are not sloppy. Take note it is **not** as you suggest: *"`for m in range(4)` [sic]"*, rather, I wrote: *"for `m in range(4)`"*. If you read as code, in order of operations, it is also clear that `n` and `m` are fixed. — Alexander McFarlane, Jun 27 '16 at 20:28
Are you trying to say "for *some* `m` in `range(4)`"? The usual implication with "for x in y" is "for *all* x in y", in both math and Python. — user2357112, Jun 27 '16 at 20:34
I appreciate that many questions are sloppy on this site, which can inadvertently lead to your confusion. However, in this instance I mean precisely what I write. Pick the *arbitrary* values say `m,n = 2,5` and evaluate each statement such that it is `True` — Alexander McFarlane, Jun 27 '16 at 20:38
@user2357112 perhaps it is more obvious by using the word *where* in place of *for* to remove that mistaken implication of a statement such as `for m in range(4)`. I have edited as such and made clear that `m, n` are pre-fixed although now the question is longer. — Alexander McFarlane, Jun 27 '16 at 20:52

Mike Müller · Accepted Answer · 2016-06-27T20:32:54.293

0

NumPy always stores the actual data in a 1D array. The multi-dimensionality comes from the ndarray object. So no need to overload __getitem__() and __setitem__(). NumPy did this already for you.

edited Jun 27 '16 at 20:32

answered Jun 27 '16 at 20:24

Mike Müller

82,630
20
166
161

Unless I am mistaken, the important sentence is: *"The data buffer is typically what people think of as arrays in C or Fortran, a contiguous (and fixed) block of memory containing fixed sized data items"* ... meaning that an `m`-dimensional array will just consists of offset pointers to this continuous blob when referring to each dimension where `m>1`? – Alexander McFarlane Jun 27 '16 at 20:48
1

I find this answer confusing. 1) you seem to be implying that numpy arrays are always contiguously stored in memory no matter what, which is not true (the ascontiguousarray method would be pretty silly). Could you clarify what you mean? – en_Knight Jun 27 '16 at 21:03
@en_Knight did you read the link? I assumed you did so I became more interested and followed the reference on the link to the MIT book ["Guide to Numpy"](http://web.mit.edu/dvp/Public/numpybook.pdf#153). Under section *"Memory Layout of ndarray"* the author writes: *"On a fundamental level, an N-dimensional array object is just a one-dimensional sequence of memory with fancy indexing code that maps an N-dimensional index into a one-dimensional index."* See the Note at the top of page 30 for information on how non-continuous views of a continuous array are interpreted. – Alexander McFarlane Jun 28 '16 at 03:12
@MikeMüller - can you add this to your answer and I'll drop you the remaining up-vote as it will be a real good answer with that addition – Alexander McFarlane Jun 28 '16 at 03:17
1

@AlexanderMcFarlane that is the "fundamental" way to consider an array, but in practice you often may find it isn't actually contiguous as you expect. If you have control over the array's initialization the risk is much smaller, but common issues are [ vs a vs f contiguousness (http://www.scipy-lectures.org/advanced/advanced_numpy/#c-and-fortran-order), views, and strides (http://www.scipy-lectures.org/advanced/advanced_numpy/#slicing-with-integers) – en_Knight Jun 28 '16 at 03:30
1

Just knowing if something is a view or not can be extremely difficult (http://stackoverflow.com/questions/11524664/how-can-i-tell-if-numpy-creates-a-view-or-a-copy). Again, if you control your own data this might not be an issue, but this answer confused me by simplifying numpy down so far - I think it could be more helpful by addressing the intricacies more thoroughly – en_Knight Jun 28 '16 at 03:32
No what you say is very interesting. I thought I covered the strides of non-continuous arrays by mentioning a view of a continuous array as in the book on pg. 30. You seem to be eluding that striding is not as efficient as populating from a continuos buffer - why would this be the case if pointers are offsetting the memory allocations ? – Alexander McFarlane Jun 28 '16 at 03:34
@AlexanderMcFarlane I don't really understand what you're saying. But the statement "striding is not as efficient as populating from a continuos buffer" can be very true when caching comes into play (which is a major part of low-level optimizations) http://stackoverflow.com/questions/9936132/why-does-the-order-of-the-loops-affect-performance-when-iterating-over-a-2d-arra – en_Knight Jun 30 '16 at 21:49

Efficient usage of tensors in python-numpy

1 Answers1