13

One may select elements in numpy arrays as follows

a = np.random.rand(100)
sel = a > 0.5 #select elements that are greater than 0.5
a[sel] = 0 #do something with the selection

b = np.array(list('abc abc abc'))
b[b==a] = 'A' #convert all the a's to A's

This property is used by the np.where function to retrive indices:

indices = np.where(a>0.9)

What I would like to do is to be able to use regular expressions in such element selection. For example, if I want to select elements from b above that match the [Aab] regexp, I need to write the following code:

regexp = '[Ab]'
selection = np.array([bool(re.search(regexp, element)) for element in b])

This looks too verbouse for me. Is there any shorter and more elegant way to do this?

Boris Gorelik
  • 29,945
  • 39
  • 128
  • 170
  • You many have seen this http://docs.scipy.org/doc/numpy/reference/generated/numpy.fromregex.html , which would be useful if your starting point was a string or a file instead of an array. – Paul Jul 06 '11 at 13:31
  • Paul, thank you. Yes I saw this (this is the first google hit for "numpy regexp"), but this doesn't solve my problem – Boris Gorelik Jul 06 '11 at 19:57

1 Answers1

21

There's some setup involved here, but unless numpy has some kind of direct support for regular expressions that I don't know about, then this is the most "numpytonic" solution. It tries to make iteration over the array more efficient than standard python iteration.

import numpy as np
import re

r = re.compile('[Ab]')
vmatch = np.vectorize(lambda x:bool(r.match(x)))

A = np.array(list('abc abc abc'))
sel = vmatch(A)
Paul
  • 42,322
  • 15
  • 106
  • 123
  • 9
    from numpy docs: `The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop.` – Ufos Oct 09 '18 at 13:12
  • 1
    Link to documentation concerning the performance of [`vectorize`](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html). – blackbrandt Feb 28 '22 at 14:29