Selecting rows if column values meet certain condition

Question

Given a numpy array, I want to slice all rows where the second column is above/equal a certain threshold. Here is my current attempt:

import numpy as np

#inp = input("Input N : ")
#N = float(inp);

N = 5

#ids = np.arange(1, N+1, 1)
#scores = np.random.uniform(low=2.0, high=6.0, size=(N,))

ids = [ 1.,          2.,          3.,          4.,          5.,        ]
scores = [ 3.75320381,  4.32400937,  2.43537978,  3.73691774,  2.5163266, ]

ids_col = ids.copy()
scores_col = scores.copy()

students_mat = np.column_stack([ids_col, scores_col])

accepted = scores_col[scores_col[:]>=4.0]

accepted_std = students_mat[:, accepted]

print(accepted_std)

Error

>>> (executing file "arrays.py")
Traceback (most recent call last):
  File "D:\I (Blank Space)\Python\arrays.py", line 19, in <module>
    accepted = scores_col[scores_col[:]>=4.0]
TypeError: '>=' not supported between instances of 'list' and 'float'

>>>

`scores_col or scores` is a `1D` array, so `scores_col[:,1]` won't work. — Divakar, May 02 '17 at 11:23

Michael Gecht · Answer 1 · 2017-05-02T19:26:12.723

1

To answer your initial question, you want to define both ids and scores as np.array. This will make your code work until you try to define accepted_std:

import numpy as np
N = 5

ids = np.array([1, 2, 3, 4, 5])
scores = np.array([3.75320381, 4.32400937, 2.43537978, 3.73691774,  2.5163266])

ids_col = ids.copy()
scores_col = scores.copy()

students_mat = np.column_stack([ids_col, scores_col])

accepted = scores_col[scores_col[:]>=4.0]

print(accepted)

I think what you actually want is to get all rows where the score is above a certain threshold. For this, you can change your code to:

import numpy as np
N = 5

ids = np.array([1, 2, 3, 4, 5])
scores = np.array([3.75320381, 4.32400937, 2.43537978, 3.73691774,  2.5163266])

students_mat = np.column_stack([ids, scores])

accepted_std = students_mat[np.where(students_mat[1] >= 4.)]

print(accepted_std)
array([[2. , 4.32400937]])

edited May 02 '17 at 19:26

answered May 02 '17 at 11:55

Michael Gecht

1,374
1
17
26

How can I index `students_mat` using `accepted`. Or, is it really possible? – user366312 May 02 '17 at 18:45
Why would you want to index it using `accepted`? With the proposed method you immediately slice from the whole `numpy` array, instead of first getting all the values for `accepted` and afterwards searching for them inside your `students_mat` array. – Michael Gecht May 02 '17 at 18:49
Your second routine doesn't compile. – user366312 May 02 '17 at 18:55
What Python version are you using? Ignore the very last `array([...])`, as it is the output of the `print()` statement. – Michael Gecht May 02 '17 at 18:57
`'3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)]'` – user366312 May 02 '17 at 19:05
What is the error message? Copying my code snippet and pasting it into a new `ipython` session works for me. – Michael Gecht May 02 '17 at 19:17
`>>> (executing file "arrays.py") D:\Python\arrays.py:31: VisibleDeprecationWarning: boolean index did not match indexed array along dimension 0; dimension is 5 but corresponding boolean dimension is 2 accepted_std = students_mat[students_mat[1] >= 4.] [[ 2. 4.32400937]] >>> ` – user366312 May 02 '17 at 19:23
It's only a DeprecationWarning, but I've updated the code. You need to wrap the boolean array in a `np.where`. As described [here](http://stackoverflow.com/a/33421185/4791226). – Michael Gecht May 02 '17 at 19:27
Why do you use `np.where` in the second block, but not the first one? – gargoylebident Sep 20 '20 at 21:57

Selecting rows if column values meet certain condition

1 Answers1