How do I access the ith column of a NumPy multidimensional array?

Question

Given:

test = np.array([[1, 2], [3, 4], [5, 6]])

test[i] gives the ith row (e.g. [1, 2]). How do I access the ith column? (e.g. [1, 3, 5]). Also, would this be an expensive operation?

score 968 · Accepted Answer · edited Jun 13 '22 at 07:43

968

To access column 0:

>>> test[:, 0]
array([1, 3, 5])

To access row 0:

>>> test[0, :]
array([1, 2])

This is covered in Section 1.4 (Indexing) of the NumPy reference. This is quick, at least in my experience. It's certainly much quicker than accessing each element in a loop.

edited Jun 13 '22 at 07:43

Mateen Ulhaq

24,552
19
101
135

answered Dec 15 '10 at 21:35

mtrw

34,200
7
63
71

13

This create a copy, is it possible to get reference, like I get a reference to a column, any change in this reference is reflected in the original array. – harmands Oct 18 '16 at 14:21
2

More helpful section of the docs: [Dimensional indexing tools](https://numpy.org/doc/stable/user/basics.indexing.html#dimensional-indexing-tools) – RapidIce Feb 01 '23 at 02:17

score 103 · Answer 2 · edited Nov 03 '19 at 15:05

>>> test[:,0]
array([1, 3, 5])

this command gives you a row vector, if you just want to loop over it, it's fine, but if you want to hstack with some other array with dimension 3xN, you will have

ValueError: all the input arrays must have same number of dimensions

while

>>> test[:,[0]]
array([[1],
       [3],
       [5]])

gives you a column vector, so that you can do concatenate or hstack operation.

e.g.

>>> np.hstack((test, test[:,[0]]))
array([[1, 2, 1],
       [3, 4, 3],
       [5, 6, 5]])

score 93 · Answer 3 · answered Apr 20 '13 at 14:05

93

And if you want to access more than one column at a time you could do:

>>> test = np.arange(9).reshape((3,3))
>>> test
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])
>>> test[:,[0,2]]
array([[0, 2],
       [3, 5],
       [6, 8]])

answered Apr 20 '13 at 14:05

Akavall

82,592
51
207
251

19

`test[:,[0,2]]` just accesses the data, e.g, `test[:, [0,2]] = something` would modify test, and not create another array. But `copy_test = test[:, [0,2]]` does in fact create a copy as you say. – Akavall Apr 11 '14 at 16:19

score 25 · Answer 4 · answered Jan 26 '16 at 09:55

25

You could also transpose and return a row:

In [4]: test.T[0]
Out[4]: array([1, 3, 5])

answered Jan 26 '16 at 09:55

Hotschke

9,402
6
46
53

Andreas K. · Answer 5 · 2020-11-23T20:13:58.243

Although the question has been answered, let me mention some nuances.

Let's say you are interested in the first column of the array

arr = numpy.array([[1, 2],
                   [3, 4],
                   [5, 6]])

As you already know from other answers, to get it in the form of "row vector" (array of shape (3,)), you use slicing:

arr_col1_view = arr[:, 1]         # creates a view of the 1st column of the arr
arr_col1_copy = arr[:, 1].copy()  # creates a copy of the 1st column of the arr

To check if an array is a view or a copy of another array you can do the following:

arr_col1_view.base is arr  # True
arr_col1_copy.base is arr  # False

see ndarray.base.

Besides the obvious difference between the two (modifying arr_col1_view will affect the arr), the number of byte-steps for traversing each of them is different:

arr_col1_view.strides[0]  # 8 bytes
arr_col1_copy.strides[0]  # 4 bytes

see strides and this answer.

Why is this important? Imagine that you have a very big array A instead of the arr:

A = np.random.randint(2, size=(10000, 10000), dtype='int32')
A_col1_view = A[:, 1] 
A_col1_copy = A[:, 1].copy()

and you want to compute the sum of all the elements of the first column, i.e. A_col1_view.sum() or A_col1_copy.sum(). Using the copied version is much faster:

%timeit A_col1_view.sum()  # ~248 µs
%timeit A_col1_copy.sum()  # ~12.8 µs

This is due to the different number of strides mentioned before:

A_col1_view.strides[0]  # 40000 bytes
A_col1_copy.strides[0]  # 4 bytes

Although it might seem that using column copies is better, it is not always true for the reason that making a copy takes time too and uses more memory (in this case it took me approx. 200 µs to create the A_col1_copy). However if we needed the copy in the first place, or we need to do many different operations on a specific column of the array and we are ok with sacrificing memory for speed, then making a copy is the way to go.

In the case we are interested in working mostly with columns, it could be a good idea to create our array in column-major ('F') order instead of the row-major ('C') order (which is the default), and then do the slicing as before to get a column without copying it:

A = np.asfortranarray(A)   # or np.array(A, order='F')
A_col1_view = A[:, 1]
A_col1_view.strides[0]     # 4 bytes

%timeit A_col1_view.sum()  # ~12.6 µs vs ~248 µs

Now, performing the sum operation (or any other) on a column-view is as fast as performing it on a column copy.

Finally let me note that transposing an array and using row-slicing is the same as using the column-slicing on the original array, because transposing is done by just swapping the shape and the strides of the original array.

A[:, 1].strides[0]    # 40000 bytes
A.T[1, :].strides[0]  # 40000 bytes

score 9 · Answer 6 · answered Mar 07 '18 at 10:43

9

To get several and indepent columns, just:

> test[:,[0,2]]

you will get colums 0 and 2

answered Mar 07 '18 at 10:43

Alberto Perez

1,019
15
17

2

How is this any different from Akavall's [answer](https://stackoverflow.com/a/16121210/369450)? – Uyghur Lives Matter Sep 29 '18 at 19:01

score 4 · Answer 7 · answered Nov 21 '15 at 06:59

4

>>> test
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

>>> ncol = test.shape[1]
>>> ncol
5L

Then you can select the 2nd - 4th column this way:

>>> test[0:, 1:(ncol - 1)]
array([[1, 2, 3],
       [6, 7, 8]])

answered Nov 21 '15 at 06:59

mac

169
1
3

score 4 · Answer 8 · answered Jul 18 '21 at 05:02

4

This is not multidimensional. It is 2 dimensional array. where you want to access the columns you wish.

test = numpy.array([[1, 2], [3, 4], [5, 6]])
test[:, a:b]  # you can provide index in place of a and b

answered Jul 18 '21 at 05:02

dinesh kumar

95
1
4

2

`2` is a 'multi'. `multidimensional` is not limited to 3 or 4 or more. The base array class in `numpy` is `ndarray`, where the `n` stands for any number from 0 up. 2 dimensional is not a special case, except that it fits our intuitions about rows and columns the best. – hpaulj Jan 09 '22 at 17:32

score 0 · Answer 9 · answered Apr 18 '23 at 06:55

This question has been answered but a note on view vs copy.

If the array is indexed using a scalar (regular indexing), the result is a view (x below) which means whatever change made to x will reflect on test because x is just a different view of test.

test = np.array([[1, 2], [3, 4], [5, 6]])
# select second column
x = test[:, 1]
x[:] = 100        # <---- this does affects test

test
array([[  1, 100],
       [  3, 100],
       [  5, 100]])

However, if the array is indexed using a list/array-like (advanced indexing), the result is a copy, which means any changes to x will not affect test.

test = np.array([[1, 2], [3, 4], [5, 6]])
# select second column
x = test[:, [1]]
x[:] = 100        # <---- this does not affect test

test
array([[1, 2],
       [3, 4],
       [5, 6]])

In general, using a slice to index will return a view:

test = np.array([[1, 2], [3, 4], [5, 6]])
x = test[:, :2]
x[:] = 100

test
array([[100, 100],
       [100, 100],
       [100, 100]])

but using an array to index will return a copy:

test = np.array([[1, 2], [3, 4], [5, 6]])
x = test[:, np.r_[:2]]
x[:] = 100

test
array([[1, 2],
       [3, 4],
       [5, 6]])

Regular indexing is extremely fast and advanced indexing is much slower (that said, it's still almost instantaneous and it certainly will not be a bottleneck in the program).

score 0 · Answer 10 · answered Jun 30 '23 at 07:30

I just want to clarify harmand's comment under mtrw's highest score answer is confusing. He says:

This create a copy, is it possible to get reference, like I get a reference to a column, any change in this reference is reflected in the original array.

While actually this code

import numpy as np

arr = np.array([[1, 2], [3, 4], [5, 6]])

barr = arr[:, 1]

print(barr)

barr[1] = 8

print(arr)

prints out

[[1 2]
 [3 8]
 [5 6]]

I would appreciate if you note this in the comments under mtrw's answer as my reputation is too low yet.

How do I access the ith column of a NumPy multidimensional array?

10 Answers10

Linked

Related