1

First off, I am new to regex and am using https://regex101.com/r/arkkVE/3 to help me learn it.

I'd like to find words from a .txt file that I have using re. So far I am able to do this, but it is very verbose and I am trying to cut back on repeated sequences of regex expressions.

currently this is what I have

Possibility = list()
with open('5LetterWords.txt') as f:
    for line in f.readlines():
        Possibility += re.findall(r'(?=\w)(?=.*[@#t])[\w]+(?=\w)(?=.*[@#o])[\w]+(?=\w)(?=.*[@#u])[\w]+'
        , line)
    print(Possibility)

This finds words that have the letters "t" and "o" and "u" in no particular order, which is the first step in what I want.

I want to add additional regex expressions that will omit words that have other characters, but I don't know how to exclude using regex.

As you can see this is starting to get really long and ugly.

Should I be using regex? Is there a better/more concise way to solve this problem?

Thanks

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
  • `[tuo][^ay]` will match any word containing any of t,u,o and not containing a,y – norok2 Aug 05 '22 at 14:07
  • `[tuo][^ay]` does not return words though, just strings of letters. – Kevin Jones Aug 05 '22 at 14:16
  • Regex is probably the fastest way in terms of execution time, but as the rules you are applying get more complex it may be more readable and easier to develop if you use more verbose python logic. – Tom Dalton Aug 05 '22 at 14:16
  • 2
    regex101 let's you save a code sample and share it. Could you provide a complete sample with some sample text and better explain your inclusions and omissions? Are you expecting me to guess what the heck does "I want to add additional regex expressions that will omit words that have other characters" mean? – MonkeyZeus Aug 05 '22 at 14:16
  • @MonkeyZeus https://regex101.com/r/arkkVE/1 -- I'd want to omit from that list, for example words that contain the letter "a" – Kevin Jones Aug 05 '22 at 14:18
  • Wait, your input looks like it is JSON... Then please don't go through it with a regex. First parse the JSON and then go through the list you get from that. – trincot Aug 05 '22 at 14:19
  • You should really provide the content of your file in the question. From that it is clear that you should not be using regex for extracting words. – norok2 Aug 05 '22 at 14:20
  • https://regex101.com/r/arkkVE/3 is the updated regex101 that has the original 5 letter word file in it. @MonkeyZeus – Kevin Jones Aug 05 '22 at 14:40
  • Any luck with my answer? – MonkeyZeus Aug 05 '22 at 15:40

2 Answers2

1

Ideally you would read the file line by line and check each word for the existence of t, o, and u and additionally check that a does not exist.

I'm not a Python dev but this seems relevant: https://stackoverflow.com/a/5189069/2191572

if ('t' in word) and ('o' in word) and ('u' in word) and ('a' not in word):
    print('yay')
else:
    print('nay')

If you insist on regex, then this would work:

^(?=.*t)(?=.*o)(?=.*u)(?!.*a).*$
  • ^ - start line anchor
  • (?=.*t) - ahead of me there exists a t
  • (?=.*o) - ahead of me there exists a o
  • (?=.*u) - ahead of me there exists a u
  • (?!.*a) - ahead of me are no as
  • .* - capture everything
  • $ - end line anchor

Note: (?!.*a).* can be substituted with [^a]*

https://regex101.com/r/WtVr8S/1

MonkeyZeus
  • 20,375
  • 4
  • 36
  • 77
1

I guess you could iterate through your list of words and filter out which word you want or don't want, for example

words = ['about', 'alout', 'aotus', 'apout', 'artou', 'atour', 'blout', 'bottu', 'bouet', 'boult', 'bouto', 'bouts', 'chout', 'clout', 'count', 'court', 'couth', 'crout', 'donut', 'doubt', 'flout', 'fotui', 'fount', 'foute', 'fouth', 'fouty', 'glout', 'gouty', 'gouts', 'grout', 'hoult', 'yourt', 'youth', 'joust', 'keout', 'knout', 'lotus', 'louty', 'louts', 'montu', 'moult', 'mount', 'mouth', 'nobut', 'notum', 'notus', 'plout', 'pluto', 'potus', 'poult', 'pouty', 'pouts', 'roust', 'route', 'routh', 'routs', 'scout', 'shout', 'skout', 'smout', 'snout', 'south', 'spout', 'stoun', 'stoup', 'stour', 'stout', 'tatou', 'taupo', 'thous', 'throu', 'thuoc', 'todus', 'tofus', 'togue', 'tolus', 'tonus', 'topau', 'toque', 'torus', 'totum', 'touch', 'tough', 'tould', 'tourn', 'tours', 'tourt', 'touse', 'tousy', 'toust', 'touts', 'troue', 'trout', 'trouv', 'tsubo', 'voust']
result = []
for word in words:
    if ('a' in word) or ('y' in word):
        continue    #to skip
    elif ('t' in word) or ('u' in word) or ('o' in word):
        result.append(word)
Adrian Ang
  • 520
  • 5
  • 12