0

As the title says, I was trying to verify the ordering in numpy arrays by changing the ordering in the following test script:

import numpy as np


# Standard array
arr = [[1, 2, 3], [-7, -8, -9], ['A', 'B', 'C']]
print(arr, '\n')

for row_index, row_entries in enumerate(arr):
    print('Row ' + str(row_index+1))
    for column_index, column_entries in enumerate(row_entries):
        print(' Column ' + str(column_index+1) + '\n', '\t [' + str(column_entries) + ']')


# NumPy array
arr = np.asarray([[1, 2, 3], [-7, -8, -9], ['A', 'B', 'C']], order='F')    # Try 'C' vs. 'F'!!
print('\n\n', arr, '\n')

for row_index, row_entries in enumerate(arr):
    print('Row ' + str(row_index+1))
    for column_index, column_entries in enumerate(row_entries):
        print(' Column ' + str(column_index+1) + '\n', '\t [' + str(column_entries) + ']')

----------------------------------------------------------------------------------------------
Output:

[[1, 2, 3], [-7, -8, -9], ['A', 'B', 'C']] 

Row 1
 Column 1
         [1]
 Column 2
         [2]
 Column 3
         [3]
Row 2
 Column 1
         [-7]
 Column 2
         [-8]
 Column 3
         [-9]
Row 3
 Column 1
         [A]
 Column 2
         [B]
 Column 3
         [C]


 [['1' '2' '3']
 ['-7' '-8' '-9']
 ['A' 'B' 'C']] 

Row 1
 Column 1
         [1]
 Column 2
         [2]
 Column 3
         [3]
Row 2
 Column 1
         [-7]
 Column 2
         [-8]
 Column 3
         [-9]
Row 3
 Column 1
         [A]
 Column 2
         [B]
 Column 3
         [C]

Why am I getting identical outputs?

  • So row major and column major does not change the order. The row major and column major are implementation details on how the contiguous array of memory for a numpy array should be stored. To get different outputs use np.nditer – Dani Mesejo Aug 07 '22 at 07:42
  • @DaniMesejo So changing the `order` only affects the "under-the-hood" memory management, but not the actual indexing in the interpreter?? Meaning that is has no practical effect other than maybe some performance optimizations (e.g. for very large array operations)? – mattze_frisch Aug 07 '22 at 08:02
  • correct, it also serves to load data in row and major order – Dani Mesejo Aug 07 '22 at 08:06
  • https://ncar-hackathons.github.io/scientific-computing/numpy/02_memory_layout.html – Dani Mesejo Aug 07 '22 at 08:06
  • Also, am I right that Python always unpacks arrays from outside to inside - and that this is the only factor determining the outcome of my the test script above? **Meaning** that I could simply rename `Row` into `Column` and vice versa and would end up with a transposed array (purely by nomenclature/labeling)?? – mattze_frisch Aug 07 '22 at 08:07
  • I didn't understood the question, if you want to find the transpose use arr.T – Dani Mesejo Aug 07 '22 at 08:24
  • @DaniMesejo My question basically is if what we're seeing here is simply the fact that concepts of "rows" and "columns" don't exist for the computer (as they are simply labels given by the user). The only convention here seems to be the order of unpacking when running `enumerate`, i.e. the fact that Python is set to unpack the outermost brackets first and work inwards. – mattze_frisch Aug 07 '22 at 08:40
  • ...but then I wonder why changing the `order`, i.e. the packing/unpacking order in the memory (i.e. the data structure) doesn't get reflected in the packing/unpacking order by the interpreter? Shouldn't this be consistent? – mattze_frisch Aug 07 '22 at 08:43
  • No, because the way the "interpreter" is going to interpret the memory layout is different. Note that the column major, row major represent the same matrix so is expected that regardless of the major order both matrices are the same – Dani Mesejo Aug 07 '22 at 10:24
  • Look at the docs for `np.reshape`. I tried to explain it at https://stackoverflow.com/questions/45973722/how-does-numpy-reshape-with-order-f-work – hpaulj Aug 07 '22 at 11:30

1 Answers1

0

You start with a list (of lists):

In [29]: alist = [[1, 2, 3], [-7, -8, -9], ['A', 'B', 'C']]
In [30]: alist
Out[30]: [[1, 2, 3], [-7, -8, -9], ['A', 'B', 'C']]

Obviously we can iterate through the list, and through the sublists.

We can make an array from that list. Usually we don't specify the order, but the default is 'C':

In [31]: arr1 = np.array(alist, order='C')
In [32]: arr1
Out[32]: 
array([['1', '2', '3'],
       ['-7', '-8', '-9'],
       ['A', 'B', 'C']], dtype='<U21')

Note that the dtype is strings (I suppose I could have specified object).

Same thing but with 'F':

In [34]: arr2 = np.array(alist, order='F')
In [35]: arr2
Out[35]: 
array([['1', '2', '3'],
       ['-7', '-8', '-9'],
       ['A', 'B', 'C']], dtype='<U21')

Display is the same.

To see how the elements are stored we have to 'ravel' the arrays. The result is a new 1d array. See np.reshape or np.ravel docs for the use of 'K' order:

In [36]: arr1.ravel('K')
Out[36]: array(['1', '2', '3', '-7', '-8', '-9', 'A', 'B', 'C'], dtype='<U21')

In [38]: arr2.ravel('K')
Out[38]: array(['1', '-7', 'A', '2', '-8', 'B', '3', '-9', 'C'], dtype='<U21')

Here we read the values of arr2 down the columns. ravel of the first array, but with 'F' order produces the same thing:

In [39]: arr1.ravel('F')
Out[39]: array(['1', '-7', 'A', '2', '-8', 'B', '3', '-9', 'C'], dtype='<U21')

Iteration as you do, doesn't change with the order. It effectively treats the array as a list.

In [40]: [row for row in arr1]
Out[40]: 
[array(['1', '2', '3'], dtype='<U21'),
 array(['-7', '-8', '-9'], dtype='<U21'),
 array(['A', 'B', 'C'], dtype='<U21')]
In [41]: [row for row in arr2]
Out[41]: 
[array(['1', '2', '3'], dtype='<U21'),
 array(['-7', '-8', '-9'], dtype='<U21'),
 array(['A', 'B', 'C'], dtype='<U21')]
In [42]: arr2.tolist()
Out[42]: [['1', '2', '3'], ['-7', '-8', '-9'], ['A', 'B', 'C']]

You have to use numpy's own methods and tools to see the effect of order. order is more useful when creating an array via reshape:

In [43]: np.arange(12)
Out[43]: array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
In [44]: np.arange(12).reshape(3,4)
Out[44]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
In [45]: np.arange(12).reshape(3,4,order='F')
Out[45]: 
array([[ 0,  3,  6,  9],
       [ 1,  4,  7, 10],
       [ 2,  5,  8, 11]])

Tweaking the shape, and then applying a transpose:

In [46]: np.arange(12).reshape(4,3,order='F')
Out[46]: 
array([[ 0,  4,  8],
       [ 1,  5,  9],
       [ 2,  6, 10],
       [ 3,  7, 11]])
In [47]: np.arange(12).reshape(4,3,order='F').T
Out[47]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

edit

It may be clearer if I make a smaller array with just 1 byte per element.

The 2 orders:

In [70]: arr1 = np.array([[1,2,3],[4,5,6]], 'uint8')
In [72]: arr2 = np.array([[1,2,3],[4,5,6]], 'uint8',order='F')
In [73]: arr1
Out[73]: 
array([[1, 2, 3],
       [4, 5, 6]], dtype=uint8)
In [74]: arr2
Out[74]: 
array([[1, 2, 3],
       [4, 5, 6]], dtype=uint8)

Instead of ravel, use tobytes with 'A' order to preserve the underlying order (see tobytes docs):

In [75]: arr1.tobytes(order='A')
Out[75]: b'\x01\x02\x03\x04\x05\x06'
In [76]: arr2.tobytes(order='A')
Out[76]: b'\x01\x04\x02\x05\x03\x06'

The difference can alse be seen in the strides:

In [77]: arr1.strides
Out[77]: (3, 1)
In [78]: arr2.strides
Out[78]: (1, 2)

strides controls how numpy iterates through the array in compiled code (but not when using python level iteration).

A comment suggested using nditer to iterate via numpy's own methods. Generally I don't recommend using nditer, but here it is is illustrative:

In [79]: [i for i in np.nditer(arr1)]
Out[79]: 
[array(1, dtype=uint8),
 array(2, dtype=uint8),
 array(3, dtype=uint8),
 array(4, dtype=uint8),
 array(5, dtype=uint8),
 array(6, dtype=uint8)]
In [80]: [i for i in np.nditer(arr2)]
Out[80]: 
[array(1, dtype=uint8),
 array(4, dtype=uint8),
 array(2, dtype=uint8),
 array(5, dtype=uint8),
 array(3, dtype=uint8),
 array(6, dtype=uint8)]

nditer takes an order, but 'K' is default (in contrast to many other cases where 'C' is the default).

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Nice answer, especially the edited part using `tobytes()` is most instructive. A few questions: 1.) What is the `dtype=' – mattze_frisch Aug 07 '22 at 19:35
  • Your list contained numbers and strings. Making an array from that makes them all strings (unless otherwise specified) - note the quotes. `print(arr)` shows the quotes, but not the `dtype`. My interactive display shows the longer `repr` format. – hpaulj Aug 07 '22 at 21:03
  • I don't know the difference between 'A' and 'K' ('any' v 'samekind'?). As my examples show, these can be used to preserve the `order` when doing a `ravel` or other `reshape`. The link that recommends against it is NOT the final word on the subject - it's someone's opinion. Most of the time you don't need to specify the `order`. Just be aware that 'C' is the default, but some operations like `transpose` can create a 'F' order. Order may also matter when converting MATLAB code. For a more indepth study, you need to learn about `strides`. – hpaulj Aug 07 '22 at 21:13
  • Thanks for this explanation. Are you sure `transpose` will create a Fortran order? I'll look up `strides`. – mattze_frisch Aug 08 '22 at 00:46
  • `transpose` returns a `view` with new shape and strides. The effect on a 2d 'c-contiguous array is to make an f-contiguous. But the change on a 1d or 3d isn't so simple. – hpaulj Aug 08 '22 at 14:37
  • Wait...are you saying that `order` and `strides` are the same thing? I.e. changing `order='C'` to `order='F'` is identical to flipping `strides` (although I'm not sure if there exists a NumPy method that does specifically that)? – mattze_frisch Aug 08 '22 at 18:59
  • Or put differently, does `strides` simply offer a readout of row- and column-wise distances in memory as defined by `order` (i.e. "grid distances" or "lattice constants" in case one wanted to allude to crystals)? – mattze_frisch Aug 08 '22 at 19:07
  • PS.: Also, could it be that these things are handled much more explicitly in C/C++ as the user has to set types and preallocate memory manually? – mattze_frisch Aug 08 '22 at 19:15
  • PPS.: See also [note](https://numpy.org/doc/stable/reference/generated/numpy.lib.stride_tricks.as_strided.html) at the bottom of the docs – mattze_frisch Aug 08 '22 at 19:19
  • `as_strided` is an advanced topic that doesn't need to be considered here. The basic thing is that the data is stored in a flat `c` array, just one big buffer. The `shape`, `strides` and `dtype` are used together to traverse that buffer. Thus the same data buffer can "store" a 1d array, 2d, 3d or more, with ints, floats, bytes. 'multidimensionaliy' is purely a function of how those attributes define low level iterations. – hpaulj Aug 08 '22 at 19:36