Efficient grouping in numpy

Question

I have a list of about 10⁶ pairs, where each element of the pair is either -1, 0, or 1:

[
 [ 0,  1],
 [-1, -1],
 [ 0, -1],
 [ 1,  0],
 ...
]

I want to split these pairs into two groups (i.e. lists of pairs) according to whether the first element of the pair is -1 or not¹.

Is there a way to do this efficiently with numpy?

Despite the terminology and notation I used above, I am in fact agnostic about the actual types of the pairs and the "lists" of pairs. Use whatever numpy or python data structure leads to the most efficient solution. (But no pandas, please.)

EDIT:

For example, if the initial list of pairs is

[
 [ 0, -1],
 [ 0, -1],
 [ 1, -1],
 [-1, -1],
 [ 1,  0],
 [-1,  1],
 [-1, -1],
 [ 0,  0],
 [ 0,  1],
 [-1,  0]
]

...an acceptable result would consist of the two lists

[
 [-1, -1],
 [-1,  1],
 [-1, -1],
 [-1,  0]
]

...and

[
 [ 0, -1],
 [ 0, -1],
 [ 1, -1],
 [ 1,  0],
 [ 0,  0],
 [ 0,  1]
]

The last two lists preserve the ordering of elements as they appeared in the original lists. This would be my preference, but it is not essential. For example, a solution consisting of

[
 [-1, -1],
 [-1, -1],
 [-1,  0],
 [-1,  1]
]

...and

[
 [ 0, -1],
 [ 0, -1],
 [ 0,  0],
 [ 0,  1],
 [ 1, -1],
 [ 1,  0],
]

...would also be acceptable.

^{¹ In other words, all the pairs in one group should have -1 at their first position, and all the elements of the other group should have either 0 or 1 at their first position.}

what do you mean by efficient solution? Efficient wrt to what? Memory/storage? Computational complexity? — Nikos M., Apr 21 '19 at 11:04
for example why isnt a simple for loop to separate the list in two, efficient? — Nikos M., Apr 21 '19 at 11:05
Please also add expected output to your question as this is raising answers which seem aren't what you were expecting. — Austin, Apr 21 '19 at 11:05
Possible duplicate of [Is there any numpy group by function?](https://stackoverflow.com/questions/38013778/is-there-any-numpy-group-by-function) — jeremycg, Apr 21 '19 at 11:09

score 2 · Accepted Answer · answered Apr 21 '19 at 11:31

2

How about just using the condition twice to check for positive and negative as

import numpy as np

a = np.array([ [ 0, -1], [ 0, -1], [ 1, -1], [-1, -1], [ 1,  0], 
                    [-1,  1], [-1, -1], [ 0,  0], [ 0,  1], [-1,  0]])

pos = a[a[:, 0]!=-1]
neg = a[a[:, 0]==-1]

print (pos)
# [[ 0 -1]
#  [ 0 -1]
#  [ 1 -1]
#  [ 1  0]
#  [ 0  0]
#  [ 0  1]]

print (neg)
# [[-1 -1]
#  [-1  1]
#  [-1 -1]
#  [-1  0]]

answered Apr 21 '19 at 11:31

Sheldore

37,862
7
57
71

Is conditional indexing faster than my answer below? – freude Apr 21 '19 at 11:47
@freude: I wouldn't called it same and yes, indeed it is different. The indexing which you and I used are to my mind the straightforward approach. It amused me though why you didn't use the indexing for the positives and resorted to `delete`. Regarding your first comment, I haven't performed any timing study. Feel free to do so. – Sheldore Apr 21 '19 at 11:56
I have shown two equal approaches not making preferences to any of them. I am pretty sure the timing is equal. But if you think that you have significantly contributed to the discussion, so it be. That is fine. – freude Apr 21 '19 at 12:06
In this binary grouping we could construct one `mask=a[:,0]==-1`, and apply `a[mask]` and `a[~mask]`. Speed is some what better. – hpaulj Apr 21 '19 at 15:28
@hpaulj: Thanks for the nice suggestion – Sheldore Apr 21 '19 at 15:41

Raphael · Answer 2 · 2019-04-21T11:21:13.503

0

import numpy as np
a = np.random.randint(-1, 2, size=(10, 2))

print(a)
[[ 0  0]
 [ 1  1]
 [ 1  1]
 [-1 -1]
 [ 0 -1]
 [ 1  1]
 [-1  1]
 [-1  0]
 [ 1 -1]
 [ 1  1]]

minus, zero, one = [np.array([r for r in a if r[0] == c]) for c in [-1, 0, 1]]


print(minus)
[[-1 -1]
 [-1  1]
 [-1  0]]
print(zero)
[[ 0  0]
 [ 0 -1]]
print(one)
[[ 1  1]
 [ 1  1]
 [ 1  1]
 [ 1 -1]
 [ 1  1]]

edited Apr 21 '19 at 11:21

answered Apr 21 '19 at 11:06

Raphael

1,731
2
7
23

Narcisse Doudieu Siewe · Answer 3 · 2019-04-21T16:20:32.807

-1

you ca, do it yourself! the only efficiency I see is generator or something like that which will save memory at the cost of computation time

def sanitize(yllp):#yllp: list-like of pair
    y = yield
    yield
    for x in yllp:
        if (x[0] in {0,1} and y != -1) or x[0] == -1 == y:
           yield x

Example:

L = [
     (-1,1), 
     (0,1), 
     (0,1), 
     (-1,1), 
     (-1,0), 
     (-1,-1), 
     (0,0), 
     (1,0)
    ]

#get list starting by 0 or 1
w=sanitize(L)    
w.next()
w.send(0)
for i in w:print(i)

#get list starting by -1
t=sanitize(L)
t.next()
t.send(-1)
for i in t:print(i)

edited Apr 21 '19 at 16:20

answered Apr 21 '19 at 11:52

Narcisse Doudieu Siewe

1,074
1
7
9

Your `list(w)` produces the `not -1` group, but where's the other? – hpaulj Apr 21 '19 at 15:41
for the other side it is simple: w=sanitize(L) w.next() w.send(-1) for i in w:print(i) I have just edited teh code – Narcisse Doudieu Siewe Apr 21 '19 at 16:10

Efficient grouping in numpy

3 Answers3