Using Regex in Python to find words with certain characters and without other characters

Question

First off, I am new to regex and am using https://regex101.com/r/arkkVE/3 to help me learn it.

I'd like to find words from a .txt file that I have using re. So far I am able to do this, but it is very verbose and I am trying to cut back on repeated sequences of regex expressions.

currently this is what I have

Possibility = list()
with open('5LetterWords.txt') as f:
    for line in f.readlines():
        Possibility += re.findall(r'(?=\w)(?=.*[@#t])[\w]+(?=\w)(?=.*[@#o])[\w]+(?=\w)(?=.*[@#u])[\w]+'
        , line)
    print(Possibility)

This finds words that have the letters "t" and "o" and "u" in no particular order, which is the first step in what I want.

I want to add additional regex expressions that will omit words that have other characters, but I don't know how to exclude using regex.

As you can see this is starting to get really long and ugly.

Should I be using regex? Is there a better/more concise way to solve this problem?

Thanks

`[tuo][^ay]` will match any word containing any of t,u,o and not containing a,y — norok2, Aug 05 '22 at 14:07
`[tuo][^ay]` does not return words though, just strings of letters. — Kevin Jones, Aug 05 '22 at 14:16
Regex is probably the fastest way in terms of execution time, but as the rules you are applying get more complex it may be more readable and easier to develop if you use more verbose python logic. — Tom Dalton, Aug 05 '22 at 14:16
regex101 let's you save a code sample and share it. Could you provide a complete sample with some sample text and better explain your inclusions and omissions? Are you expecting me to guess what the heck does "I want to add additional regex expressions that will omit words that have other characters" mean? — MonkeyZeus, Aug 05 '22 at 14:16
@MonkeyZeus https://regex101.com/r/arkkVE/1 -- I'd want to omit from that list, for example words that contain the letter "a" — Kevin Jones, Aug 05 '22 at 14:18
Wait, your input looks like it is JSON... Then please don't go through it with a regex. First parse the JSON and then go through the list you get from that. — trincot, Aug 05 '22 at 14:19
You should really provide the content of your file in the question. From that it is clear that you should not be using regex for extracting words. — norok2, Aug 05 '22 at 14:20
https://regex101.com/r/arkkVE/3 is the updated regex101 that has the original 5 letter word file in it. @MonkeyZeus — Kevin Jones, Aug 05 '22 at 14:40

MonkeyZeus · Answer 1 · 2022-08-05T15:42:43.307

Ideally you would read the file line by line and check each word for the existence of t, o, and u and additionally check that a does not exist.

I'm not a Python dev but this seems relevant: https://stackoverflow.com/a/5189069/2191572

if ('t' in word) and ('o' in word) and ('u' in word) and ('a' not in word):
    print('yay')
else:
    print('nay')

If you insist on regex, then this would work:

^(?=.*t)(?=.*o)(?=.*u)(?!.*a).*$

^ - start line anchor
(?=.*t) - ahead of me there exists a t
(?=.*o) - ahead of me there exists a o
(?=.*u) - ahead of me there exists a u
(?!.*a) - ahead of me are no as
.* - capture everything
$ - end line anchor

Note: (?!.*a).* can be substituted with [^a]*

https://regex101.com/r/WtVr8S/1

score 1 · Accepted Answer · answered Aug 05 '22 at 14:32

I guess you could iterate through your list of words and filter out which word you want or don't want, for example

words = ['about', 'alout', 'aotus', 'apout', 'artou', 'atour', 'blout', 'bottu', 'bouet', 'boult', 'bouto', 'bouts', 'chout', 'clout', 'count', 'court', 'couth', 'crout', 'donut', 'doubt', 'flout', 'fotui', 'fount', 'foute', 'fouth', 'fouty', 'glout', 'gouty', 'gouts', 'grout', 'hoult', 'yourt', 'youth', 'joust', 'keout', 'knout', 'lotus', 'louty', 'louts', 'montu', 'moult', 'mount', 'mouth', 'nobut', 'notum', 'notus', 'plout', 'pluto', 'potus', 'poult', 'pouty', 'pouts', 'roust', 'route', 'routh', 'routs', 'scout', 'shout', 'skout', 'smout', 'snout', 'south', 'spout', 'stoun', 'stoup', 'stour', 'stout', 'tatou', 'taupo', 'thous', 'throu', 'thuoc', 'todus', 'tofus', 'togue', 'tolus', 'tonus', 'topau', 'toque', 'torus', 'totum', 'touch', 'tough', 'tould', 'tourn', 'tours', 'tourt', 'touse', 'tousy', 'toust', 'touts', 'troue', 'trout', 'trouv', 'tsubo', 'voust']
result = []
for word in words:
    if ('a' in word) or ('y' in word):
        continue    #to skip
    elif ('t' in word) or ('u' in word) or ('o' in word):
        result.append(word)

Using Regex in Python to find words with certain characters and without other characters

2 Answers2