Given a list of arbitrary words, how to remove all words with double letters?

Question

Similar to this post, but I want to perform this task on an arbitrary list or words supplied by an external source, which may change according to user input. For example, I may have:

input = ['annotate','color','october','cellular','wingding','appeasement','sorta']

and the output ought to be

output = ['color','october','wingding','sorta']

Any help would be appreciated!

Could you show an example of what exactly you define as arbitrary in your program, because it is not clear why the answer to the post you have given would not work in the case you have shown? — Amolgorithm, Jul 16 '23 at 02:39
The answer to the post I linked only checks for repeat patterns of 'd' and 'r'. Would it work to just throw the whole alphabet into that regex? I'm not super savvy with regex. — Ben S., Jul 16 '23 at 02:41
I have a doubt regarding your exact requirements. `cellular` would be removed, but would `cel_lular` be removed? — Amolgorithm, Jul 16 '23 at 03:00

Amolgorithm · Answer 1 · 2023-07-17T16:41:18.620

5

You can use a regular expression as simple as this. This answer extends the post you mentioned to fulfill your requirements.

import re

arr = ['annotate','color','october','cellular','wingding','appeasement','sorta']
result = [w for w in arr if not re.search(r'(\w)\1', w)]

print(result)

It works for word characters, which includes: a-z, A-Z, 0-9, _.

edited Jul 17 '23 at 16:41

answered Jul 16 '23 at 02:45

Amolgorithm

697
2
20

[`input` is *not* a keyword](https://docs.python.org/3/reference/lexical_analysis.html#keywords). Reusing a predefined global variable is not necessarily an issue either, if some care is taken. — Also, you can remove the `+` quantifier from the regular expression. – Konrad Rudolph Jul 17 '23 at 16:32
@KonradRudolph Ah really? Seems I am mistaken, thank you for informing me of this. I will edit my post accordingly. – Amolgorithm Jul 17 '23 at 16:34

score 4 · Answer 2 · answered Jul 16 '23 at 02:47

4

You can use regular expression to to this.

list(filter(lambda x: not  re.search(r'(\w)\1', x), input))

answered Jul 16 '23 at 02:47

NaNisNumber

51
3

score 2 · Answer 3 · answered Jul 16 '23 at 02:43

Use nested loops to check each word for double letters:

This solution may be easier to understand than a RegEx based solution.

words = ['annotate','color','october','cellular','wingding','appeasement','sorta']

for index, word in enumerate(words):
    last = None
    for letter in word:
        if letter == last:
            del words[index]  # Double letters, so delete from the list.
        last = letter
print(words)

Output:

['color', 'october', 'wingding', 'sorta']

This code simply checks each word for double letters, and removes it from the words list if it does have double letters. Note that issues will arise if you name a variable input.

score 2 · Answer 4 · answered Jul 16 '23 at 02:49

You can compare adjacent characters by zipping a word with itself. This makes for a fairly succinct list comprehension:

words = ['annotate','color','october','cellular','wingding','appeasement','sorta']

[w for w in words if all(a != b for a, b in zip(w, w[1:]))]    
# ['color', 'october', 'wingding', 'sorta']

zip(w, w[1:]) will make tuples of letters like [('a', 'n'),('n', 'n'), ('n', 'o')...] which you can then compare, any that are equal indicate the same letter in a row.

score 1 · Answer 5 · answered Jul 16 '23 at 02:43

import re

def filter_words(input_list):
    output_list = []
    pattern = r'(.)\1'  # Regex pattern to match consecutive identical characters
    
    for word in input_list:
        if not re.search(pattern, word):
            output_list.append(word)
    
    return output_list

# Example usage:
input = ['annotate', 'color', 'october', 'cellular', 'wingding', 'appeasement', 'sorta']
output = filter_words(input)
print(output)

Output:

['color', 'october', 'wingding', 'sorta']

Augusto Vasques · Answer 6 · 2023-07-17T17:48:26.580

Still without using Regex, a variation of the answer using nested loops can be constructed using a Python language feature, the use of the else clause in loops.

In Python, loop statements can have an else clause that is executed when the loop finishes by exhausting the iterable but is not executed when the loop is terminated by a break statement. This allow the following algorithm:

input = ['annotate','color','october','cellular','wingding','appeasement','sorta', 'watt']
output = []

for word in input:                            
    for idx, char in enumerate(word[1:]):     # Iterate over the enumeration from the second to the last character of the word
        if word[idx] == char:                 # Check if the char is equal to the previous character in the word 
            break                             # It interrupts the loop and the `else` clause is not executed
    else:
        output.append(word)
      
print(output)
#['color', 'october', 'wingding', 'sorta']

^{Try it online}

See also Python built-in function enumeration()

Given a list of *arbitrary* words, how to remove all words with double letters?

6 Answers6

Given a list of arbitrary words, how to remove all words with double letters?