19

What does the order parameter in numpy.array() do?

It says in the documentation I link to that it will specify the contiguous order of the array, but I got no idea what that is supposed to mean. So what is contiguous order?

Copy of the order parameter documentation:

order : {‘C’, ‘F’, ‘A’}, optional Specify the order of the array. If order is ‘C’ (default), then the array will be in C-contiguous order (last-index varies the fastest). If order is ‘F’, then the returned array will be in Fortran-contiguous order (first-index varies the fastest). If order is ‘A’, then the returned array may be in any order (either C-, Fortran-contiguous, or even discontiguous).

Horse SMith
  • 1,003
  • 2
  • 12
  • 25
  • I think its related to how the array is represented in memory. By default it will be a contiguous sequence of bytes in memory. – Marcin Dec 03 '14 at 08:18
  • 2
    The main benefit would be that copying such arrays would be very, very fast. You would just copy one area of memory to other as a whole, instead of element by element basis. this could be done using [memcpy](http://www.cplusplus.com/reference/cstring/memcpy/) in C/C++. – Marcin Dec 03 '14 at 08:40
  • 2
    A couple of questions which might help [here](http://stackoverflow.com/q/26998223/3923281) and [here](http://stackoverflow.com/q/4059363/3923281). – Alex Riley Dec 03 '14 at 10:31
  • @ajcr The second one makes some sense of it, and it goes along what Marcin said. the 'C' option could be contiguous as seen by C, and the 'F' by Fortran. What is the difference and why use either? What does it actually do, and why would you use the 'A' option? :s – Horse SMith Dec 03 '14 at 11:35
  • 3
    The difference between C and F is just whether the array is row major or column major (i.e. either row or column entries are stored in adjacent memory address). C order means that operating row-rise on the array will be slightly quicker. F order means that column-wise operations will be faster. Specifying `A` means that the created array is not required to have either order - it is allowed to be discontiguous in memory (if for example array `a` is not contiguous then `a.copy('A')` may also be discontiguous). – Alex Riley Dec 03 '14 at 12:45
  • 5
    Possible duplicate of [What is the difference between contiguous and non-contiguous arrays?](https://stackoverflow.com/questions/26998223/what-is-the-difference-between-contiguous-and-non-contiguous-arrays) – gauteh Jul 28 '17 at 09:34
  • @AlexRiley: How about (please) formulate your comment into an answer because it's helpful? – NeoZoom.lua Apr 25 '18 at 06:36
  • Or (please) close this question as a possible duplicate may be helpful too. – NeoZoom.lua Apr 25 '18 at 06:43

1 Answers1

13

Lets first unpack what K A C and F stand for first. I am referring to the implementation details section of this.

  • C Is Contiguous layout. Mathematically speaking, row major.
  • F Is Fortran contiguous layout. Mathematically speaking, column major.
  • A Is any order. Generally don't use this.
  • K Is keep order. Generally don't use this.

From here I can refer you to other answers that address the two following questions: Data Contiguity and Row vs. Column Major Ordering. Row vs Column Major Ordering is best described by its Wikipedia article. So now lets talk about data contiguity. In python this generally is not so important so I'm going to jump to C for a moment here.

In C there are two options for storing a 2D array.

  1. An array of arrays
  2. A flattened array

In the first example, the type of data we are storing in our array is another array. In terms of pointers, we have a block of memory where each value in it is a pointer to another block of memory. In order to find a value at any point we must de-reference first the outer array and then the inner array.

In the second example, we have a single block of memory the size of rows * columns. We can we can de-reference any index to get its value. But the indices are 1 dimensional. A 2D index can be converted using y + x * width.

When doing numerical calculations, we strive to use contiguous arrays. The reason for this is cache acceleration, which numpy relies on. If I want to add the value a to each value in a 2D array, I could move the entire flattened array into the cache if it fits. However, you could only move a single column (or row) into the cache for an array of arrays. If you want to know more, look up SIMD [Same instruction multiple data].

Miladiouss
  • 4,270
  • 1
  • 27
  • 34
esdanol
  • 356
  • 3
  • 9
  • 4
    You're mixing up how a C array of arrays works and how a Java array of arrays works. Your "array of arrays" image is from a Java course. – user2357112 Aug 21 '18 at 22:41
  • Yes it is from a java course but the concept is the same. You have a C array of C arrays, in essence a pointer to to a block of memory containing several pointers to other blocks of memory. The memory is not contiguous. – esdanol Aug 22 '18 at 16:03
  • 2
    That is not how a C array of arrays works. That is how a C pointer into a C array of pointers into more arrays works. A C array of arrays is contiguous, and laid out identically to a C-contiguous NumPy multidimensional array. (That's why it's called C-contiguous.) – user2357112 Aug 22 '18 at 16:14