4

I'm really new to regex and I've been able to find regex which can match this quite easily, but I am unsure how to only match words without it.

I have a .txt file with words like

sheep
fleece
eggs
meat
potato

I want to make a regular expression that matches words in which vowels are not repeated consecutively, so it would return eggs meat potato.

I'm not very experienced with regex and I've been unable to find anything about how to do this online, so it'd be awesome if someone with more experience could help me out. Thanks!

I'm using python and have been testing my regex with https://regex101.com.

Thanks!

EDIT: provided incorrect examples of results for the regular expression. Fixed.

notHalfBad
  • 213
  • 2
  • 9
  • As an alternative, why not just proceed with your current solution (match words that have vowels repeated), and then in the rest of your logic just _exclude_ such words? – voithos Aug 10 '16 at 00:18
  • thats a good alternative except that is only one of the conditions I'm trying to match. I know how to match all the other conditions, just not this one. – notHalfBad Aug 10 '16 at 00:20
  • @notHalfBad Please clarify: what have you tried and what is the problem with it? – John1024 Aug 10 '16 at 00:25
  • You could potentially use negative-lookarounds (see http://stackoverflow.com/a/406408/716118), although it may not be the best approach. Regarding the alternative, what would be the downside of `re.match() and not re.match()`, besides the need for multiple regexes? – voithos Aug 10 '16 at 00:26
  • @smac89 err, I'm terrible at writing questions correctly. I have updated it, but the key thing is that there are _no_ consecutively repeated vowels in any of those words. – notHalfBad Aug 10 '16 at 00:28
  • @John1024 I have tried matching for (letters that are not [aeiou] | letters that are [aeiou]){1} but after that I have no idea how to match the next letter with whatever of those two previous conditions were *not* matched with the previous letter. Of course, I'm not actually using that syntax but it's long and I don't want to go to the effort of typing the alphabet with all vowels removed. – notHalfBad Aug 10 '16 at 00:33

2 Answers2

9

Note that, since the desired output includes meat but not fleece, desired words are allowed to have repeated vowels, just not the same vowel repeated.

To select lines with no repeated vowel:

>>> [w for w in open('file.txt') if not re.search(r'([aeiou])\1', w)]
['eggs\n', 'meat\n', 'potato\n']

The regex [aeiou] matches any vowel (you can include y if you like). The regex ([aeiou])\1 matches any vowel followed by the same vowel. Thus, not re.search(r'([aeiou])\1', w) is true only for strings w that contain no repeated vowels.

Addendum

If we wanted to exclude meat because it has two vowels in a row, even though they are not the same vowel, then:

>>> [w for w in open('file.txt') if not re.search(r'[aeiou]{2}', w)]
['eggs\n', 'potato\n']
John1024
  • 109,961
  • 14
  • 137
  • 171
  • @smac89 But, `meat` does not have the _same_ vowel repeated like `fleece` does. The OP specified `meat` as being part of the desired output. – John1024 Aug 10 '16 at 00:50
  • @smac89 yeah thats an error on my part lol, but his answer has solved my problem :D – notHalfBad Aug 10 '16 at 00:51
  • @smac89 I added to the answer a method for excluding any line with consecutive vowels so that `meat` is excluded. – John1024 Aug 10 '16 at 00:53
0

@John1024 's answer should work I also would try

"\w*(a{2,}|e{2,}|i{2,}|o{2,}|u{2,})\w*"ig

mouse_s
  • 58
  • 1
  • 9