How to extract rows from an numpy array based on the content?

Question

As title, for example, I have an 2d numpy array, like the one below,

[[33, 21, 1],
 [33, 21, 2],
 [32, 22, 0],
 [33, 21, 3],
 [34, 34, 1]]

and I want to extract these rows orderly based on the content in the first and the second column, in this case, I want to get 3 different 2d numpy arrays, as below,

[[33, 21, 1],
 [33, 21, 2],
 [33, 21, 3]]

and

[[32, 22, 0]]

and

[[34, 34, 1]]

What function in numpy could I use to do this? I think the point is to distinguish different rows with their first and second columns. If elements in these columns are the same, then the specific rows are categorized in the same output array. I want to write a python function to do this kind of job, because I could have a much more bigger array than the one above. Feel free to give me advice, thank you.

Angus Williams · Answer 1 · 2016-09-24T07:55:55.043

1

You would use boolean indexing to do this. To obtain the three examples you give (in the same order as you posted them, where x is your original 2d array), you could write:

numpy.atleast_2d( x[ x[:,1]==21 ] )
numpy.atleast_2d( x[ x[:,2]==0 ] )
numpy.atleast_2d( x[ x[:,2]==1 ] )

The first should be interpreted as saying 'extract the rows of x where the element in the second column equals 21' and so on. There is a page in the scipy docs that explains how to use indexing in numpy here. Since you required that the returned arrays all be 2D, I have used the atleast_2d function.

edited Sep 24 '16 at 07:55

answered Sep 24 '16 at 07:25

Angus Williams

2,284
19
21

Thx for answering me, I know the concept in your comment, but if I got a case with many rows in an array, and I still need to extract them, then I will have to write a lot of boolean indexing to do this. And all I want is to write a function to handle this kind of problems no matter how many rows in an array. – Heinz Sep 24 '16 at 08:05
I'm not sure I understand what you are saying. Your question suggests that you want to extract rows of a 2D array 'based on the content' of some of the columns - boolean indexing is exactly what you need for this kind of problem! It makes no difference how many rows there are in your array. I may have misunderstood you, so perhaps try to phrase your question a little more clearly. – Angus Williams Sep 24 '16 at 08:14

Divakar · Accepted Answer · 2016-09-24T08:27:27.320

1

Here's an approach to handle many such groupings -

# Sort array based on second column
sorted_a = a[np.argsort(a[:,1])]

# Get shifting indices for first col. Split along axis=0 using those.
shift_idx = np.unique(sorted_a[:,1],return_index=True)[1][1:]
out = np.split(sorted_a,shift_idx)

Alternatively, for performance efficiency purposes, we can get shift_idx, like so -

shift_idx = np.flatnonzero(sorted_a[1:,1] > sorted_a[:-1,1])+1

Sample run -

In [27]: a
Out[27]: 
array([[33, 21,  1],
       [33, 21,  2],
       [32, 22,  0],
       [33, 21,  3],
       [34, 34,  1]])
In [28]: sorted_a = a[np.argsort(a[:,1])]

In [29]: np.split(sorted_a,np.unique(sorted_a[:,1],return_index=True)[1][1:])
Out[29]: 
[array([[33, 21,  1],
        [33, 21,  2],
        [33, 21,  3]]), array([[32, 22,  0]]), array([[34, 34,  1]])]

edited Sep 24 '16 at 08:27

answered Sep 24 '16 at 08:18

Divakar

218,885
19
262
358

Thx for your answer, I think it is good and capable to solve my problem, but I have another relative question. That is, how to split rows with the same element in the second column, but different in the first? – Heinz Sep 24 '16 at 09:44
@Heinz I would suggest posting another question on the same, as that might involve few significant changes to the posted solutions. – Divakar Sep 24 '16 at 09:50
Ok, I would update the question, thank you for the advice. – Heinz Sep 24 '16 at 12:11
@Heinz I meant it would be better to see those as a new question, as there are considerable changes and accordingly the solutions would need good amount of changes. – Divakar Sep 29 '16 at 13:30
Thank you for the advice, I post the updated question in the new post http://stackoverflow.com/questions/39771934/how-to-extract-arrays-from-an-arranged-numpy-array – Heinz Sep 29 '16 at 13:42

score 1 · Answer 3 · answered Sep 29 '16 at 13:37

1

The numpy_indexed package (disclaimer: I am its author) contains functionality to efficiently perform these type of operations:

import numpy_indexed as npi
npi.group_by(a[:, :2]).split(a)

answered Sep 29 '16 at 13:37

Eelco Hoogendoorn

10,459
1
44
42

Thank you for the advice, I post the updated question in the new post http://stackoverflow.com/questions/39771934/how-to-extract-arrays-from-an-arranged-numpy-array could you answer the question there? – Heinz Sep 29 '16 at 13:41

How to extract rows from an numpy array based on the content?

3 Answers3

Linked