0

For my own project, I have a .txt file containing 200k English words. I have a class called WordCross (a game) which will search for words with certain letters as parameters, Suppose I have the letters A X D E L P. I want to return a list of English words with these letters. Now I have stumbled upon a problem. I want to use a regex and add the words that match to a "hits" list. However, I can't think of a way to create this regex.

Here is my current code:

import re
class WordCross:
    def __init__(self, a,b,c,d,e,f):
        file = open("english3.txt", "r")
        hits = []
        for words in file:
            if words.lower() == re.search("a", words):
                hits.append(words)
        hits.sort()
        print(hits)

test = WordCross("A", "B", "C", "D", "E", "F")

Any help will be appreciated! Kind regards, Douwe

Daniel Walker
  • 6,380
  • 5
  • 22
  • 45
Douwe
  • 39
  • 5
  • 1
    perhaps `if re.search(f'[{a}{b}{c}{d}{e}{f}]', words) is not None:`? – Nick Jun 12 '20 at 14:18
  • Does this answer your question? [How to use a variable inside a regular expression?](https://stackoverflow.com/questions/6930982/how-to-use-a-variable-inside-a-regular-expression) – Maurice Meyer Jun 12 '20 at 14:19
  • @MauriceMeyer I did take a look at that code, but it only contains a single variable, not multiple. Therefore it is unclear to me how to do this using multiple variables – Douwe Jun 12 '20 at 14:25
  • @Nick this does seem to work, however it does accept other strings which contain letters not given as parameters. – Douwe Jun 12 '20 at 14:35

4 Answers4

1

If you want to only return the words which match all the letters passed into the constructor, you need to use re.match and add an end-of-line anchor to the regex as well. You can use the asterisk operator (*) to allow for an arbitrary number of letters to be passed to the constructor (see the manual). In this demo I've simulated reading the file with a list of words from a string:

wordlist = '''
Founded in two thousand and eight Stack Overflow is the largest most trusted 
online community for anyone that codes to learn share their knowledge and 
build their careers More than fifty million unique visitors come to Stack Overflow
each month to help solve coding problems develop new skills and find job opportunities
'''.split()
wordlist = list(set(wordlist))

import re
class WordCross:
    def __init__(self, *letters):
        # file = open("english3.txt", "r")
        hits = []
        charset = f"[{''.join(letters)}]"
        regex = re.compile(rf"(?!.*({charset}).*\1){charset}+$", re.I)
        for word in wordlist:
            if regex.match(word) is not None:
                hits.append(word)
        hits.sort()
        print(hits)

test = WordCross("A", "C", "E", "H", "K", "T", "S")

Output:

['Stack', 'each', 'the']
Nick
  • 138,499
  • 22
  • 57
  • 95
  • Thanks, this does work great! how would I change it if each variable letter can only be used once? E.g: suppose letters are A C E H K T S. I want to only be able to have the word CET and not CETT – Douwe Jun 13 '20 at 00:26
  • 1
    When I saw `that` come up in my demo answer I was wondering if you were going to ask that question. Give me a few minutes... – Nick Jun 13 '20 at 00:27
  • @Douwe I've updated the regex to include a negative lookahead to ensure no character is repeated – Nick Jun 13 '20 at 00:38
0

I not sure exactly what regular expression you want to use, but it is trivial to build an expression using simple string substitution. You can alter your function to accept an arbitrary number of patterns to search as well. Hope this helps a little.

import re
class WordCross:
    def __init__(self, *patterns):
        list_of_patterns = "|".join(patterns)
        reg_exp = r"({0})".format(list_of_patterns)
        print(reg_exp)
        file = open("english3.txt", "r")
        hits = []
        for words in file:
            if re.search(reg_exp, words):
                hits.append(words)
        hits.sort()
        print(hits)

test = WordCross("A", "B", "C", "D", "E", "F")
adamkgray
  • 1,717
  • 1
  • 10
  • 26
0

I'm assuming words in your file is line-separated.

Code:

import re
from io import StringIO

source = '''
RegExr was created by gskinner.com, and is proudly hosted by Media Temple.
Edit the Expression & Text to see matches. Roll over matches or the expression for details. PCRE & JavaScript flavors of RegEx are supported. Validate your expression with Tests mode.
The side bar includes a Cheatsheet, full Reference, and Help. You can also Save & Share with the Community, and view patterns you create or favorite in My Patterns.
Explore results with the Tools below. Replace & List output custom results. Details lists capture groups. Explain describes your expression in plain English.
'''.split()  # assuming words are line-separated here.

file_simulation = StringIO('\n'.join(source))  # simulating file open


class WordCross:
    def __init__(self, *args):
        self.file = file_simulation
        self.hits = []

        for words in self.file:
            if re.search(f"[{''.join(args)}]", words.upper()):
                self.hits.append(words.strip())

        self.hits.sort()
        print(self.hits)


test = WordCross("A", "B", "C", "D", "E", "F")

Result:

['Cheatsheet,', 'Community,', ... 'view', 'was']

Process finished with exit code 0

jupiterbjy
  • 2,882
  • 1
  • 10
  • 28
0

Couple of suggestions:

  • I don't see anything meriting a class here. A simple function should suffice.

  • Don't use file as a variable; it's the name of a python builtin.

  • When using an open file handle in general it's better to do so within a with block.

Untested:

import re
def WordCross(*patterns):
    pattern = "|".join(patterns)
    c_pattern = re.compile(pattern, re.IGNORECASE)
    with open("english3.txt") as fp:
        hits = [line for line in fp if c_pattern.search(line)]
    print(sorted(hits))
Rory Browne
  • 627
  • 1
  • 5
  • 11