NumPy create new n-new arrays based on array index

Question

I'm trying to create a new array based on an index, which is the first element in each row. I feel like i'm missing something really simple here.

array looks like this and the first number in the row is the index.

[[ 1  0  1  2  3  4]
 [ 1  5  6  7  8  9]
 [ 2 10 11 12 13 14]
 [ 2 15 16 17 18 19]
 [ 4 20 21 22 23 24]]

the outcome I would like would be such like:

array 1:

range 1=
[[ 1  0  1  2  3  4]
 [ 1  5  6  7  8  9]]

array 2:

range2 =
[[ 2 10 11 12 13 14]
[ 2 15 16 17 18 19]]

Array 3:

range 3=
[[ 4 20 21 22 23 24]]

This is the code I currently have, but I have N number possible index numbers and I can't obviously make an if statement for all of them. I was planning on using a list then converting that list in an numpy array. I've also looked at zipping them before using hstack but I couldn't get that to work either.

import numpy as np

data = np.arange(25).reshape(5,5)
indexList = np.array(([[1,1,2,2,4]]))
indexList = np.transpose(indexList)
array = np.hstack((indexList, data))

range1 = []
range2 = []
range3 = []
for row in array:
    if row[0] == 1:
        range1.append(row)
    if row[0] == 2:
        range2.append(row)
    if row[0] == 3:
        range3.append(row)

score 1 · Answer 1 · answered Apr 21 '21 at 22:09

You're trying to essentially do a group-by in numpy, and there isn't a great solution to that within numpy itself (though you can take a look at some answers to similar questions).

I'd transform the array to a pandas dataframe, since these are nice for groupby operations, get each group's values, and assign them to a dictionary key. You can then access them like you would any other value in a dict:

import pandas as pd
df = pd.DataFrame(array)
gb = df.groupby(0)
dict_of_arrays = {f"range{g}": gb.get_group(g).to_numpy() for g in gb.groups.keys()}

>> print(dict_of_arrays["range1"])
[[1 0 1 2 3 4]
 [1 5 6 7 8 9]]

>>> print(dict_of_arrays["range2"])
[[ 2 10 11 12 13 14]
 [ 2 15 16 17 18 19]]

>>> print(dict_of_arrays["range4"])
[[ 4 20 21 22 23 24]]

Great answer thanks, I thought there would have been some sort of function built in but good to know for the future. I've been trying to keep all in numpy, so I didn't go with this, but i'll keep it saved for sure, thanks! — Ben Cowley, Apr 26 '21 at 12:31

score 1 · Accepted Answer · answered Apr 21 '21 at 22:13

You can create a numpy.array with your ranges like so:

import numpy as np

indices = np.unique(a[:, 0])
size = len(indices)
ranges = np.zeros((size,), dtype=object)

for i in range(size):
    ranges[i] = a[a[:, 0] == indices[i]]

Then, if you print out ranges, you get each one of your desired arrays. The index (1, 2 or 4 in your case) which correlate to an item in ranges would be determined by indices.

>>> list(ranges)
    [array([[1, 0, 1, 2, 3, 4],
            [1, 5, 6, 7, 8, 9]]),
     array([[ 2, 10, 11, 12, 13, 14],
            [ 2, 15, 16, 17, 18, 19]]),
     array([[ 4, 20, 21, 22, 23, 24]])]

score 1 · Answer 3 · answered Apr 23 '21 at 11:36

I would suggest to introduce a nested list for easier iterating and just compare current line with the previous. Later you can split the list just by index

import numpy as np

data = np.arange(25).reshape(5,5)
indexList = np.array(([[1,1,2,2,4]]))
indexList = np.transpose(indexList)
array = np.hstack((indexList, data))

range = [[]]
n=row[0]
for row in array:
    if row[0]!= n:
        n = row[0]
        range.append([])
        range[len(range)-1].append(row)
    else:
        range[len(range)-1].append(row)

NumPy create new n-new arrays based on array index

3 Answers3