Words in a list with consecutively repeated letters

Question

Right now I have a list of for example

data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']

I want to remove the words with the repeated letters, in which I want to remove the words

'aa','aac','bbb','bcca','ffffff'

Maybe import re?

Sorry, made an edit. Not from the first character, but anywhere exists repeated letters — tooooony, Nov 16 '17 at 13:03
Please do **not** change your question after you have received valid answers when the change invalidates those answers! — PM 2Ring, Nov 16 '17 at 13:07
You might want to put the "consecutively repeated" from your title in your question, and add a word like "abab" to make it clear (assuming I'm right) that you want it to remain. — DSM, Nov 16 '17 at 13:09
Don't edit a question mid-way again. If you continue to do so, your posts will be closed. Changing your question after others have taken the time to solve your original query is very disrespectful to their time. — cs95, Nov 16 '17 at 13:32

PM 2Ring · Answer 1 · 2017-11-16T13:18:56.377

The original version of this question wanted to drop words that consist entirely of repetitions of a single character. An efficient way to do this is to use sets. We convert each word to a set, and if it consists of only a single character the length of that set will be 1. If that's the case, we can drop that word, unless the original word consisted of a single character.

data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff'] 
newdata = [s for s in data if len(s) == 1 or len(set(s)) != 1]
print(newdata)

output

['dog', 'cat', 'a', 'aac', 'bcca']

Here's code for the new version of your question, where you want to drop words that contain any repeated characters. This one's simpler, because we don't need to make a special test for one-character words..

data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff'] 
newdata = [s for s in data if len(set(s)) == len(s)]
print(newdata)

output

['dog', 'cat', 'a']

If the repetitions have to be consecutive, we can handle that using groupby.

from itertools import groupby

data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff', 'abab', 'wow'] 
newdata = [s for s in data if max(len(list(g)) for _, g in groupby(s)) == 1]
print(newdata)

output

['dog', 'cat', 'a', 'abab', 'wow']

sets do not consider repetitive letters in the word so it is not sufficient. — taras, Nov 16 '17 at 13:05
@PM2Ring Yep, upvoted. However I think OP was unclear from beginning chancing the content of his question multiple times. — Anton vBR, Nov 16 '17 at 13:21

Anton vBR · Accepted Answer · 2017-11-16T13:09:46.917

1

Thanks to this thread: Regex to determine if string is a single repeating character

Here is the re version, but I would stick to PM2 ring and Tameem's solutions if the task was as simple as this:

import re
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']  
[i for i in data if not re.search(r'^(.)\1+$', i)]

Output

['dog', 'cat', 'a', 'aac', 'bcca']

And the other:

import re
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']  
[i for i in data if not re.search(r'((\w)\2{1,})', i)]

Output

['dog', 'cat', 'a']

edited Nov 16 '17 at 13:09

answered Nov 16 '17 at 13:05

Anton vBR

18,287
5
40
46

Not familiar with Regex, so maybe will just use the loop and sets for now. – tooooony Nov 16 '17 at 13:09
@JieNiu But that's where you are wrong. If you have more of these tasks Regex is the only place to go – Anton vBR Nov 16 '17 at 13:10
1

@JieNiu The answer you chose doesn't handle words like `'wow'` – Anton vBR Nov 16 '17 at 13:13
Cool. Will take a look of the reference on Regex. Thanks. – tooooony Nov 16 '17 at 13:18
1

@AntonvBR: It's by far not the "only place to go" but it can be a very powerful tool indeed. – Eric Duminil Nov 16 '17 at 13:22

score 1 · Answer 3 · answered Nov 16 '17 at 13:12

Loop is the way to go. Forget about sets so far as they do not work for words with repetitive letters.

Here is a method you can use to determine if word is valid in a single loop:

def is_valid(word):
    last_char = None
    for i in word:
        if i == last_char:
            return False

        last_char = i

    return True

Example

In [28]: is_valid('dogo')
Out[28]: True

In [29]: is_valid('doo')
Out[29]: False

Eric Duminil · Answer 4 · 2017-11-16T13:27:14.483

1

Here's a way to check if there are consecutive repeated characters:

def has_consecutive_repeated_letters(word):
    return any(c1 == c2 for c1, c2 in zip(word, word[1:]))

You can then use a list comprehension to filter your list:

words = ['dog','cat','a','aa','aac','bbb','bcca','ffffff', 'abab', 'wow']
[word for word in words if not has_consecutive_repeated_letters(word)]
# ['dog', 'cat', 'a', 'abab', 'wow']

edited Nov 16 '17 at 13:27

answered Nov 16 '17 at 13:21

Eric Duminil

52,989
9
71
124

score 0 · Answer 5 · answered Nov 16 '17 at 13:04

0

One line is all it takes :)

data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']  
data =  [value for value in data if(len(set(value))!=1 or len(value) ==1)]
print(data)

Output

['dog', 'cat', 'a', 'aac', 'bcca']

answered Nov 16 '17 at 13:04

Tameem

408
7
19

Words in a list with consecutively repeated letters

5 Answers5