How to limit the results of a Python if-in statement when checking if a string is found in another string?

Question

I wrote a Python for loop that goes through each word in the English language (from nltk.corpus import words), and prints words made only of 6 letters provided by the user. The 6 user inputs are stored in a list named characters, so the for loop compares the items from the list to each string (english words).

The problem is that words are printed that contain multiple characters of the same character. For example, if the characters are 'u, l, c, i , e, n', words with multiple letters such as "icicle" are returned. How to I prevent the script from returning words with duplicate letters?

characters = [input1, input2, input3, input4, input5, input6]

for word in word_list:
    word = word.lower()
    if len(word) == 3:
        if word[0] in characters and word[1] in characters and word[2] in characters:
            print(word)
    elif len(word) == 4:
        if word[0] in characters and word[1] in characters and word[2] in characters and word[3] in characters:
            print(word)
    elif len(word) == 5:
        if word[0] in characters and word[1] in characters and word[2] in characters and word[3] in characters and word[4] in characters:
            print(word)
    elif len(word) == 6:
        if word[0] in characters and word[1] in characters and word[2] in characters and word[3] in characters and word[4] in characters and word[5] in characters:
            print(word)

I know the code is inefficiently written, so I'd appreciate tips on improvement as well. An example of the results of the above script is:

eel
eileen
eli
ell
elle
ellen
ellice
encell
ennui
eunice
ice
iceni
icicle
ilicic
ilicin
ill
inn
inulin

Does this answer your question? [Implementing an algorithm to determine if a string has all unique characters](https://stackoverflow.com/questions/17357370/implementing-an-algorithm-to-determine-if-a-string-has-all-unique-characters) — ggorlen, May 27 '20 at 22:01
`len(set(foo)) == len(foo)` ensures `foo` is entirely unique characters. As for string comparisons, just `all(x in y for x in foo)` instead of checking each index by hand. `if 3 <= len(foo) <= 6` saves a lot of branches. — ggorlen, May 27 '20 at 22:05

score 1 · Accepted Answer · answered May 27 '20 at 22:02

1

This is untested since I have no test data, but should do:

characters = [input1, input2, input3, input4, input5, input6]

for word in word_list:
    word = word.lower()
    isIn = True
    for c in word:
        if c not in characters or word.count(c) != 1:
            isIn = False
    if isIn:
        print(word)

answered May 27 '20 at 22:02

HackLab

509
3
10

This worked perfectly, and was an efficient solution. Thank you! – ttmcl98 May 28 '20 at 02:23
This is actually a fair amount more inefficient than can be done, since you're looping through all the letters in word two levels deep; you're looping through each character in word, and with each iteration, you're potentially looping through each character in word *again* when you call word.count. That's O(n^2) time complexity, where n is the average length of a word. English words are relatively rather short in terms of data size, so this isn't that much of a problem in this case, but I caution against using this approach in all other settings. Look at my answer for a (on avg) O(n) solution. – Anonymous1847 Jun 03 '20 at 20:53

score 1 · Answer 2 · edited Jun 20 '20 at 09:12

I don't know this package, but it sounds that your word list is big. You should use a keyword tree instead of looping through the whole list everytime, when new letters are given. It is possible that this package contains better data structures for accessing those words, if not then you should transform it into a Trie. It is a one-time task and after it, lookup times become faster for every input.

Answering your question, you can make a dictionary, what maps the input letters with their quantities. For example:

input = {'a':1, 'b':2, 'c':1}

Then, if you are looping on each word, costly you can count each letter. If you are using a Trie, then you only need to go over on children and make a recursive call if

input[children's letter] != 0

before the recursive call, you need to decrement that value, and after call increment it.

This way, you only go over on the words that starts the same as your letters instead of going over every word, every time.

Hope it helps :)

Counting the letters, I mean: `word.count(letter) == input[letter]` similarly to @HackLab 's answer — Levente Simofi, May 27 '20 at 22:30

InstaK0 · Answer 3 · 2020-05-27T22:52:56.457

My first thought about the efficiency is:

def test_word(word, characters):
    for i in range(len(word)):
        if word[i] not in characters:  # Does everything in 2 lines :)
            return False
    return True

This function returns False if the word has letters not in the list "characters", and True otherwise. The reason I used a function is simply because it is neater and you can run the code from any point in your program easily. Make sure you use a copy of the list "characters" if you need to use it in the future:

copy_of_chars = characters.copy()
test_word(word, copy_of_chars)

About the duplicate letters- I would delete any letter in the list that has been "found":

def test_word(word, characters):
    for i in range(len(word)):
        if word[i] not in characters:
            return False
        characters.pop(characters.index([word[i]]))  # Removes the letter from the list "characters"
    return True

This function will return False if the word has characters not in the list characters, or if it has multiple letters when only one can be found in the list "characters". Otherwise it will return True.

Hope this helps!

You cannot pop characters or strings like word[i] from lists. pop() takes integer indices. You need to find then remove, which is expensive anyway. — Anonymous1847, May 27 '20 at 22:32

Anonymous1847 · Answer 4 · 2020-05-27T22:52:42.983

You can use collections.Counter.

from collections import Counter

Then, to get Counter objects (essentially multisets) which count how many times each character occurs in the word and in the inputted allowed characters:

word_counter = Counter(word)
characters_counter = Counter(characters)

To check that the word is a subset of the characters, and print if so, do

if word_counter & characters_counter == word_counter:
    print(word)

(& means intersection)

Very simple. Quick, because it uses standard library functionality hash maps that are optimized and probably written in C, instead of costly multiple-level list loops and finds and additions and removals. It also has the added benefit that if a user enters the same characters multiple times, then it will allow words with that character repeated multiple times, up to however many times the user entered it.

For example, if the user entered "i, i, c, c, l, e" then the word "icicle" would still be printed, whereas if they entered "i, i, c, z, l, e" then "icicle" would not be printed.

from collections import Counter

# input characters, get words...

characters_counter = Counter(characters)
for word in word_list:
    word_counter = Counter(word)
    if word_counter & characters_counter == word_counter:
        print(word)

Done!

score -1 · Answer 5 · answered May 27 '20 at 22:26

-1

Didn't test it:

   for word in word_list:
        if word < 6:
            if all(letter in character for letter in list(word.lower()):
                print(word)

answered May 27 '20 at 22:26

Dratius

49
4

1

This does not fix the asker's problem, with words like "icicle" still being printed. – Anonymous1847 May 27 '20 at 22:29

How to limit the results of a Python if-in statement when checking if a string is found in another string?

5 Answers5