Select rows by minimum values of a column considering unique values of another column (numpy array)

Question

I want to select only the rows for each unique value of a column (first column) that have a minimum value in another column (second column).

How can I do it?

Let's say I have this array:

[[10, 1], [10, 5], [10, 2], [20, 4], [20, 1], [20, 7], [20, 2], [40, 7], [40, 4], [40, 5]]

I would like to obtain the following array:

[[10, 1], [20, 1], [40, 4]]

I was trying selecting rows in this way:

d = {i: array[array[:, 0] == i] for i in np.unique(array[:, 0])}

but then I dont't know how to detect the one with minimum value in the second row.

score 0 · Answer 1 · answered Feb 21 '22 at 17:03

What you want is the idea of groupby, as implemented in pandas for instance. As we don't have that in numpy, let's implement something similar to this other answer.

Let's call your input array A. So first, sort the rows by the values in the first column. We do this so that all entries with the same value appear one after the other.

sor = A[A[:,0].argsort()]

And get the indices where new unique values are found.

uniq=np.unique(sor[:,0],return_index=True)[1]
print(uniq)

>>> array([0, 3, 7])

This indicates the places of the array where we need to cut to get groups. Now split the second column into such groups. That way you get chunks of elements of the second column, grouped by the elements on the first column.

grp=np.split(sor[:,1],uni[1:])
print(grp)
>>> [array([1, 5, 2]), array([4, 1, 7, 2]), array([7, 4, 5])]

Last step is to get the index of the minimum value out of each of these groups

ind=np.array(list(map(np.argmin,grp))) + uni
print(ind)
>>> array([0, 4, 8])

The first part maps the np.argmin function to every group in grp. The + uniq part is there for mapping every one of these minimum arguments into the original scale.

Now you only need to index your sorted array using these indices.

print(sor[ind])
>>> array([[10,  1],
       [20,  1],
       [40,  4]])

Select rows by minimum values of a column considering unique values of another column (numpy array)

1 Answers1