11

Seeing this answer I am wondering if the creation of a flattened view of X are essentially the same, as long as I know that the number of axes in X is 3:

A = X.ravel()

s0, s1, s2 = X.shape
B = X.reshape(s0*s1*s2)

C = X.reshape(-1)  # thanks to @hpaulj below

I'm not asking if A and B and C are the same.

I'm wondering if the particular use of ravel and reshape in this situation are essentially the same, or if there are significant differences, advantages, or disadvantages to one or the other, provided that you know the number of axes of X ahead of time.

The second method takes a few microseconds, but that does not seem to be size dependent.

Community
  • 1
  • 1
uhoh
  • 3,713
  • 6
  • 42
  • 95

1 Answers1

16

Look at their __array_interface__ and do some timings. The only difference that I can see is that ravel is faster.

.flatten() has a more significant difference - it returns a copy.

A.reshape(-1)

is a simpler way to use reshape.

You could study the respective docs, and see if there is something else. I haven't explored what happens when you specify order.

I would use ravel if I just want it to be 1d. I use .reshape most often to change a 1d (e.g. arange()) to nd.

e.g.

np.arange(10).reshape(2,5).ravel()

Or choose the one that makes your code most readable.


reshape and ravel are defined in numpy C code:

In https://github.com/numpy/numpy/blob/0703f55f4db7a87c5a9e02d5165309994b9b13fd/numpy/core/src/multiarray/shape.c

PyArray_Ravel(PyArrayObject *arr, NPY_ORDER order) requires nearly 100 lines of C code. And it punts to PyArray_Flatten if the order changes.

In the same file, reshape punts to newshape. That in turn returns a view is the shape doesn't actually change, tries _attempt_nocopy_reshape, and as last resort returns a PyArray_NewCopy.

Both make use of PyArray_Newshape and PyArray_NewFromDescr - depending on how shapes and order mix and match.

So identifying where reshape (to 1d) and ravel are different would require careful study.


Another way to do this ravel is to make a new array, with a new shape, but the same data buffer:

np.ndarray((24,),buffer=A.data)

It times the same as reshape. Its __array_interface__ is the same. I don't recommend using this method, but it may clarify what is going on with these reshape/ravel functions. They all make a new array, with new shape, but with share data (if possible). Timing differences are the result of different sequences of function calls - in Python and C - not in different handling of the data.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks @hpaulj, but I'm looking for a definitive answer. The extra microsecond(s) must mean something. However the `.reshape(-1)` is very helpful! – uhoh Oct 14 '15 at 07:36
  • That requires finding them in the compiled code. Clearly `ravel` takes a direct route, `reshape` a more general one. Do I have to do the digging or will you? – hpaulj Oct 14 '15 at 09:54
  • 6
    You do not "have to do" anything of course! Answering questions is voluntary - no? But right now I can only read python. Maybe someone else might know? – uhoh Oct 14 '15 at 09:59
  • We are talking about times where one level function redirection makes a big difference. As long as the result is a `view`, there is no difference in how the 'data' is handled. – hpaulj Oct 14 '15 at 17:46
  • Wow! That is what I call a *definitive answer*!! I (and others) can actually learn a lot from your discussion. This is really helpful and I appreciate your time "going deep". Now I understand the significant time differences that do not scale with size. So far, I get the feeling that `ravel()` is the preferred for **flattened views** since it seems to be fastest. – uhoh Oct 16 '15 at 07:34