9

How can create a regex class that is the intersection of two other regex classes? For example, how can I search for consonants with the [a-z] and [^aeiou] without explicitly constructing a regex class containing all the consonants like so:

[bcdfghjlkmnpqrstvwxyz] # explicit consonant regex class
Malik Brahimi
  • 16,341
  • 7
  • 39
  • 70

3 Answers3

9

This regex should do the trick : (?=[^aeiou])(?=[a-z]).

The first group (?=...) asserts that the pattern [^aeiou] can be matched, then restarts the matching at the beginning and moves on to the second pattern (which works the same way), it's like a logical AND, and the whole regex will only match if all of these two expressions match.

Community
  • 1
  • 1
7

As an alternative to Python's re module, you can do this explicitly with the regex library, which supports set operations for character classes:

The operators, in order of increasing precedence, are:

|| for union (“x||y” means “x or y”)

~~ (double tilde) for symmetric difference (“x~~y” means “x or y, but not > both”)

&& for intersection (“x&&y” means “x and y”)

-- (double dash) for difference (“x––y” means “x but not y”)

So to match only consonants, your regular expression could be:

>>> regex.findall('[[a-z]&&[^aeiou]]+', 'abcde', regex.VERSION1)
['bcd']

Or equivalently using set difference:

>>> regex.findall('[[a-z]--[aeiou]]+', 'abcde', regex.VERSION1)
['bcd']
artu-hnrq
  • 1,343
  • 1
  • 8
  • 30
Alex Riley
  • 169,130
  • 45
  • 262
  • 238
0

The character class difference or intersection is not available in the re module, so what you can do?

using ranges:

[bcdfghj-np-tv-z]

using the \w character class:

[^\W0-9_aeiouAEIOU]

a lookahead (not very efficient since you need to make a test for each character):

(?:(?![eiou])[b-z])

using the new regex module that has the difference feature:

[[b-z]--[eiou]]
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125