Is there a way to make a list of arrays, sorted out from first column?

Question

I have to make a list of arrays grouped out from the first column number. This is using numpy. I have following array; x. The first column shows how the arrays should be organized with the numbers 0,1,3 and 4.

x = np.array([[0,0,0,0,3],
              [1,0,0,2,3],
              [4,0,0,0,0],
              [3,0,0,0,2],
              [0,0,0,0,3],
              [1,0,0,2,3]])

I found out how to sort the array:

data = x[np.argsort(x[:, 0])]
print(data)
[[0 0 0 0 3]
 [0 0 0 0 3]
 [1 0 0 2 3]
 [1 0 0 2 3]
 [3 0 0 0 2]
 [4 0 0 0 0]]

But the output has to be a list with elements of arrays like this:

list_of_arrays = np.array([[[0,0,0,3],
                            [0,0,0,3]], 
                           [[0,0,2,3],[0,0,2,3]], 
                           [[0,0,0,2]],
                           [[0,0,0,0]]])

So the first column works as a marker for how the arrays should look like in the list. I'm pretty new to python and coding in general, so any help is much appreciated.

Jussi Nurminen · Answer 1 · 2020-10-14T09:58:39.280

0

One-liner:

[np.row_stack(x[np.where(x[:, 0] == y)])[:, 1:] for y in set(x[:, 0])]

[array([[0, 0, 0, 3],
        [0, 0, 0, 3]]),
 array([[0, 0, 2, 3],
        [0, 0, 2, 3]]),
 array([[0, 0, 0, 2]]),
 array([[0, 0, 0, 0]])]

Explanation: this is a list comprehension where set() will first pick all unique values from the first matrix column. The comprehension goes through all those values, and np.where finds indices of rows starting with each value. The first column of each row is then removed by the slice expression and the rows are grouped by row_stack() into arrays.

edited Oct 14 '20 at 09:58

answered Oct 14 '20 at 09:11

Jussi Nurminen

2,257
1
9
16

1

I hghly doubt that this will do the grouping as requested above. – Patrick Artner Oct 14 '20 at 09:18
there is not a third dimension - the results are grouped by the 1st (now removed) order number – Patrick Artner Oct 14 '20 at 09:31
ah, now I understand – Jussi Nurminen Oct 14 '20 at 09:47

score 0 · Answer 2 · answered Oct 14 '20 at 09:44

You would need to groupy the first value, then you can create new arrays with fromiter and resize them accordingly and reassamble into an array:

import numpy as np
from itertools import chain, groupby, tee

x = np.array([[0,0,0,0,3],
              [1,0,0,2,3],
              [4,0,0,0,0],
              [3,0,0,0,2],
              [0,0,0,0,3],
              [1,0,0,2,3]])


def fromiter2d_drop_first(it, dtype):
    # modified from
    # https://stackoverflow.com/a/45738134/7505395 
    it, it2 = tee(it)
    length = sum(1 for _ in it2) 

    flattened = chain.from_iterable(it)
    array_1d = np.fromiter(flattened, dtype)
    array_2d = np.reshape(array_1d, (length, -1))
    return array_2d [:, 1:]


data = x[np.argsort(x[:, 0])]
groups = np.array( [fromiter2d_drop_first(v, int) 
                    for k,v in groupby(data, key=lambda i:i[0])], dtype=object)
print(groups)

Output:

[array([[0, 0, 0, 3], [0, 0, 0, 3]])
 array([[0, 0, 2, 3], [0, 0, 2, 3]]) 
 array([[0, 0, 0, 2]])
 array([[0, 0, 0, 0]])]

score 0 · Answer 3 · answered Oct 14 '20 at 10:01

Solution:
This might help you out:

import numpy as np
from itertools import groupby
from operator import itemgetter

x = np.array([[0,0,0,0,3],
              [1,0,0,2,3],
              [4,0,0,0,0],
              [3,0,0,0,2],
              [0,0,0,0,3],
              [1,0,0,2,3]])

def get_arr_lists(iterr):
    tmp_list = []
    for val in iterr:
        tmp_list.append(val[1:])
    return tmp_list

data = x[np.argsort(x[:, 0])]

final_arr = [get_arr_lists(iterr) for x, iterr in groupby(data, key = itemgetter(0))]

print(final_arr)

Output:

[
    [array([0, 0, 0, 3]), array([0, 0, 0, 3])], 
    [array([0, 0, 2, 3]), array([0, 0, 2, 3])], 
    [array([0, 0, 0, 2])], 
    [array([0, 0, 0, 0])]
]

Is there a way to make a list of arrays, sorted out from first column?

3 Answers3