1

How to return a set of rows of a NumPy Matrix that would match a given condition?

This is a Numpy Matrix object

>>> X

matrix([['sunny', 'hot', 'high', 'FALSE'],
        ['sunny', 'hot', 'high', 'TRUE'],
        ['overcast', 'hot', 'high', 'FALSE'],
        ['rainy', 'mild', 'high', 'FALSE'],
        ['rainy', 'cool', 'normal', 'FALSE'],
        ['rainy', 'cool', 'normal', 'TRUE'],
        ['overcast', 'cool', 'normal', 'TRUE'],
        ['sunny', 'mild', 'high', 'FALSE'],
        ['sunny', 'cool', 'normal', 'FALSE'],
        ['rainy', 'mild', 'normal', 'FALSE'],
        ['sunny', 'mild', 'normal', 'TRUE'],
        ['overcast', 'mild', 'high', 'TRUE'],
        ['overcast', 'hot', 'normal', 'FALSE'],
        ['rainy', 'mild', 'high', 'TRUE']], 
       dtype='|S8')

I would like to get the set of all rows that has the first column value as 'rainy' so it tried this

>>> X[X[:,0]=='rainy']

matrix([['rainy', 'rainy', 'rainy', 'rainy', 'rainy']], 
       dtype='|S8')

But I wanted an output like this

matrix([['rainy', 'mild', 'high', 'FALSE'],
        ['rainy', 'cool', 'normal', 'FALSE'],
        ['rainy', 'cool', 'normal', 'TRUE'],
        ['rainy', 'mild', 'normal', 'FALSE'],
        ['rainy', 'mild', 'high', 'TRUE']], 
       dtype='|S8')

How should this be done?

Ébe Isaac
  • 11,563
  • 17
  • 64
  • 97
  • 1
    This is yet another reason not to use `np.matrix` - you're getting screwed over by `np.matrix`'s insistence on always being 2D. – user2357112 Mar 24 '16 at 17:27
  • @user2357112: Thanks for the info. But could you suggest an alternative NumPy structure that can handle string data but with the flexibility of a NumPy array class? – Ébe Isaac Mar 24 '16 at 17:30
  • 1
    `numpy.array`, of course. – user2357112 Mar 24 '16 at 17:31
  • Well that worked like a charm! Thanks @user2357112; I didn't notice that `numpy.array` is suitable for strings too. That really does solve my problem. By the way, what is the use of `numpy.matrix` anyway while `numpy.array` is better? (Should I post this as another question in SO?) – Ébe Isaac Mar 24 '16 at 17:39
  • @ÉbeIsaac: see [here](http://stackoverflow.com/questions/4151128/what-are-the-differences-between-numpy-arrays-and-matrices-which-one-should-i-u) for an explanation. And note that since we now have @ for matrix multiplication in modern python, numpy matrices lost their one great syntactic advantage (multiplication syntax.) – DSM Mar 24 '16 at 17:39
  • 1
    [`numpy.matrix` exists for the sole purpose of making it easier to teach people who would be confused by the `.dot` syntax.](https://www.python.org/dev/peps/pep-0465/#transparent-syntax-is-especially-crucial-for-non-expert-programmers) – user2357112 Mar 24 '16 at 17:44
  • Wow! The matrix structure is present for just the pedagogical use case and code simplicity. Thanks for the info user2357112 and @DSM. – Ébe Isaac Mar 24 '16 at 17:54

2 Answers2

3
>>> X[(X[:, 0] == 'rainy').ravel(), :]
matrix([['rainy', 'mild', 'high', 'FALSE'],
        ['rainy', 'cool', 'normal', 'FALSE'],
        ['rainy', 'cool', 'normal', 'TRUE'],
        ['rainy', 'mild', 'normal', 'FALSE'],
        ['rainy', 'mild', 'high', 'TRUE']], 
       dtype='|S8')

If you look at the result of your comparison:

>>> X[:, 0] == 'rainy'
array([[False],
       [False],
       [False],
       [ True],
       [ True],
       [ True],
       [False],
       [False],
       [False],
       [ True],
       [False],
       [False],
       [False],
       [ True]], dtype=bool)

This needs to be flattened into a vector using ravel:

(X[:, 0] == 'rainy').ravel()
array([False, False, False,  True,  True,  True, False, False, False,
        True, False, False, False,  True], dtype=bool)

For additional constraints, this works:

X[(X[:, 0] == 'rainy').ravel() & (X[:, 1] == 'cool').ravel(), :]
matrix([['rainy', 'cool', 'normal', 'FALSE'],
        ['rainy', 'cool', 'normal', 'TRUE']], 
       dtype='|S8')
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • Thank you very much. Suppose, I would like to add an extra column constraint like `X[:,1]=='cool'`, where do I add it, Alex? – Ébe Isaac Mar 24 '16 at 17:41
  • A rather complex implementation for something so simple (for Python, that is). Anyway, I can understand its because of the `numpy.matrix` class, so **answer accepted**! Thanks to the discussion with @user2357112, I'm switching over to `numpy.array` for a much simpler implementation. – Ébe Isaac Mar 24 '16 at 18:02
1

There are more than one way of doing it.

foo = np.where(X[:, 0] == 'rainy') # get the index
X[foo, :]                          # The result you want.
Hun
  • 3,707
  • 2
  • 15
  • 15
  • Your answer works well for a single column constraint (completely covering my question). Could you expand a little for additional column constraints like `X[:,1]=='mild'`? – Ébe Isaac Mar 24 '16 at 17:57