0

Right now I have a list of for example

data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']  

I want to remove the words with the repeated letters, in which I want to remove the words

'aa','aac','bbb','bcca','ffffff'

Maybe import re?

tooooony
  • 21
  • 4
  • Sorry, made an edit. Not from the first character, but anywhere exists repeated letters – tooooony Nov 16 '17 at 13:03
  • Please do **not** change your question after you have received valid answers when the change invalidates those answers! – PM 2Ring Nov 16 '17 at 13:07
  • 1
    You might want to put the "consecutively repeated" from your title in your question, and add a word like "abab" to make it clear (assuming I'm right) that you want it to remain. – DSM Nov 16 '17 at 13:09
  • 1
    Don't edit a question mid-way again. If you continue to do so, your posts will be closed. Changing your question after others have taken the time to solve your original query is very disrespectful to their time. – cs95 Nov 16 '17 at 13:32

5 Answers5

1

The original version of this question wanted to drop words that consist entirely of repetitions of a single character. An efficient way to do this is to use sets. We convert each word to a set, and if it consists of only a single character the length of that set will be 1. If that's the case, we can drop that word, unless the original word consisted of a single character.

data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff'] 
newdata = [s for s in data if len(s) == 1 or len(set(s)) != 1]
print(newdata)

output

['dog', 'cat', 'a', 'aac', 'bcca']

Here's code for the new version of your question, where you want to drop words that contain any repeated characters. This one's simpler, because we don't need to make a special test for one-character words..

data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff'] 
newdata = [s for s in data if len(set(s)) == len(s)]
print(newdata)

output

['dog', 'cat', 'a']

If the repetitions have to be consecutive, we can handle that using groupby.

from itertools import groupby

data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff', 'abab', 'wow'] 
newdata = [s for s in data if max(len(list(g)) for _, g in groupby(s)) == 1]
print(newdata)

output

['dog', 'cat', 'a', 'abab', 'wow']
PM 2Ring
  • 54,345
  • 6
  • 82
  • 182
1

Thanks to this thread: Regex to determine if string is a single repeating character

Here is the re version, but I would stick to PM2 ring and Tameem's solutions if the task was as simple as this:

import re
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']  
[i for i in data if not re.search(r'^(.)\1+$', i)]

Output

['dog', 'cat', 'a', 'aac', 'bcca']

And the other:

import re
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']  
[i for i in data if not re.search(r'((\w)\2{1,})', i)]

Output

['dog', 'cat', 'a']
Anton vBR
  • 18,287
  • 5
  • 40
  • 46
1

Loop is the way to go. Forget about sets so far as they do not work for words with repetitive letters.

Here is a method you can use to determine if word is valid in a single loop:

def is_valid(word):
    last_char = None
    for i in word:
        if i == last_char:
            return False

        last_char = i

    return True

Example

In [28]: is_valid('dogo')
Out[28]: True

In [29]: is_valid('doo')
Out[29]: False
taras
  • 3,579
  • 3
  • 26
  • 27
1

Here's a way to check if there are consecutive repeated characters:

def has_consecutive_repeated_letters(word):
    return any(c1 == c2 for c1, c2 in zip(word, word[1:]))

You can then use a list comprehension to filter your list:

words = ['dog','cat','a','aa','aac','bbb','bcca','ffffff', 'abab', 'wow']
[word for word in words if not has_consecutive_repeated_letters(word)]
# ['dog', 'cat', 'a', 'abab', 'wow']
Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
0

One line is all it takes :)

data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']  
data =  [value for value in data if(len(set(value))!=1 or len(value) ==1)]
print(data)

Output

['dog', 'cat', 'a', 'aac', 'bcca']
Tameem
  • 408
  • 7
  • 19