How to return values in the second column greater than 25 from a random array in numpy?

Question

I have an array that looks like such:

import numpy as np

z=np.random.randint(101,size=(5,3))

array([[41, 98, 63],
       [61, 65, 66],
       [21,  3, 90],
       [53, 60, 26],
       [60, 18, 19]])

I want to return values in the second column greater than 25, such as my answer will be:

array([[98],
       [65],
       [60]])

I tried to create a condition as such:

condition = z[:,1:2] > 25

but when I tried to run:

 z[condition]

I get an error

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.

"""Entry point for launching an IPython kernel. --------------------------------------------------------------------------- IndexError Traceback (most recent call last) in () ----> 1 z[condition]

IndexError: boolean index did not match indexed array along dimension 1; dimension is 3 but the corresponding boolean dimension is 1

Can someone help, please?

Can you share some context for this? What kind of data is it, why is it in a NumPy array? — AMC, Jan 27 '20 at 01:47
https://stackoverflow.com/questions/23911875/select-certain-rows-condition-met-but-only-some-columns-in-python-numpy — AMC, Jan 27 '20 at 02:03
https://stackoverflow.com/questions/4455076/how-to-access-the-ith-column-of-a-numpy-multidimensional-array?rq=1 — AMC, Jan 27 '20 at 02:05
Does this answer your question? [Select certain rows (condition met), but only some columns in Python/Numpy](https://stackoverflow.com/questions/23911875/select-certain-rows-condition-met-but-only-some-columns-in-python-numpy) — AMC, Jan 27 '20 at 02:15

score 3 · Answer 1 · answered Jan 26 '20 at 06:46

You should look at what condition is giving you:

> a[:,1:2] > 25
array([
   [ True],
   [ True],
   [False],
   [ True],
   [False]])

That's probably not the shape you want. If instead you make the condition:

> a[:,1] > 25
array([ True,  True, False,  True, False])

You get a one-dimensional array you can use to index the single column:

> condition = a[:,1] > 25
> a[:,1:2][condition]

array([
   [98],
   [65],
   [60]
])

If you just want a flat result you can the same mask:

> a[:,1][condition]

array([98, 65, 60])

score -1 · Accepted Answer · answered Jan 26 '20 at 06:49

-1

import numpy as np
import pandas as pd
z=np.random.randint(101,size=(5,3))
dfx = pd.DataFrame(data=z, columns='A B C'.split())
y = np.array(dfx.B[dfx.B>25]).reshape(len(dfx.B[dfx.B>25]),1)
print(y)

answered Jan 26 '20 at 06:49

Bhosale Shrikant

463
3
7

This solution is incredibly, ridiculously over-engineered. Having to use Pandas for this trivial operation is astounding. If `arr` is the NumPy array and `df` is the DataFrame created from that array, then the solution is simply `df.loc[df[1].gt(25), 1].to_numpy()`. – AMC Jan 27 '20 at 02:14
@AMC got an error running your solutions ''numpy.ndarray' object has no attribute 'loc'" – Bhosale Shrikant Jan 27 '20 at 04:08
got: array([58, 47]), the questioner wanted the output in different shape .....but I learnt from you inputs thnx, my code could have been little simpler @AMC – Bhosale Shrikant Jan 27 '20 at 04:24
I would like some clarification from OP on the matter, actually. I don’t see any evidence that the shape of example output was a deliberate choice, it’s a small thing. In any case, adding the dimension should be trivial, either with indexing and `numpy.newaxis`, `numpy.reshape()`, or `numpy.expand_dims()`. – AMC Jan 27 '20 at 04:59
You also didn’t explain in your answer why that solution is better than the alternative(s). – AMC Jan 27 '20 at 04:59

How to return values in the second column greater than 25 from a random array in numpy?

2 Answers2