1

I have an array that looks like such:

import numpy as np

z=np.random.randint(101,size=(5,3))

array([[41, 98, 63],
       [61, 65, 66],
       [21,  3, 90],
       [53, 60, 26],
       [60, 18, 19]])

I want to return values in the second column greater than 25, such as my answer will be:

array([[98],
       [65],
       [60]])

I tried to create a condition as such:

condition = z[:,1:2] > 25

but when I tried to run:

 z[condition]

I get an error

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.

"""Entry point for launching an IPython kernel. --------------------------------------------------------------------------- IndexError Traceback (most recent call last) in () ----> 1 z[condition]

IndexError: boolean index did not match indexed array along dimension 1; dimension is 3 but the corresponding boolean dimension is 1

Can someone help, please?

Hamed Moghadasi
  • 1,523
  • 2
  • 16
  • 34
younes
  • 21
  • 3
  • Can you share some context for this? What kind of data is it, why is it in a NumPy array? – AMC Jan 27 '20 at 01:47
  • https://stackoverflow.com/questions/23911875/select-certain-rows-condition-met-but-only-some-columns-in-python-numpy – AMC Jan 27 '20 at 02:03
  • https://stackoverflow.com/q/22927181/11301900 – AMC Jan 27 '20 at 02:04
  • https://stackoverflow.com/questions/4455076/how-to-access-the-ith-column-of-a-numpy-multidimensional-array?rq=1 – AMC Jan 27 '20 at 02:05
  • Does this answer your question? [Select certain rows (condition met), but only some columns in Python/Numpy](https://stackoverflow.com/questions/23911875/select-certain-rows-condition-met-but-only-some-columns-in-python-numpy) – AMC Jan 27 '20 at 02:15

2 Answers2

3

You should look at what condition is giving you:

> a[:,1:2] > 25
array([
   [ True],
   [ True],
   [False],
   [ True],
   [False]])

That's probably not the shape you want. If instead you make the condition:

> a[:,1] > 25
array([ True,  True, False,  True, False])

You get a one-dimensional array you can use to index the single column:

> condition = a[:,1] > 25
> a[:,1:2][condition]

array([
   [98],
   [65],
   [60]
])

If you just want a flat result you can the same mask:

> a[:,1][condition]

array([98, 65, 60])
Mark
  • 90,562
  • 7
  • 108
  • 148
-1
import numpy as np
import pandas as pd
z=np.random.randint(101,size=(5,3))
dfx = pd.DataFrame(data=z, columns='A B C'.split())
y = np.array(dfx.B[dfx.B>25]).reshape(len(dfx.B[dfx.B>25]),1)
print(y)
  • This solution is incredibly, ridiculously over-engineered. Having to use Pandas for this trivial operation is astounding. If `arr` is the NumPy array and `df` is the DataFrame created from that array, then the solution is simply `df.loc[df[1].gt(25), 1].to_numpy()`. – AMC Jan 27 '20 at 02:14
  • @AMC got an error running your solutions ''numpy.ndarray' object has no attribute 'loc'" – Bhosale Shrikant Jan 27 '20 at 04:08
  • got: array([58, 47]), the questioner wanted the output in different shape .....but I learnt from you inputs thnx, my code could have been little simpler @AMC – Bhosale Shrikant Jan 27 '20 at 04:24
  • I would like some clarification from OP on the matter, actually. I don’t see any evidence that the shape of example output was a deliberate choice, it’s a small thing. In any case, adding the dimension should be trivial, either with indexing and `numpy.newaxis`, `numpy.reshape()`, or `numpy.expand_dims()`. – AMC Jan 27 '20 at 04:59
  • You also didn’t explain in your answer why that solution is better than the alternative(s). – AMC Jan 27 '20 at 04:59