Right now I have a list of for example
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
I want to remove the words with the repeated letters, in which I want to remove the words
'aa','aac','bbb','bcca','ffffff'
Maybe import re
?
Right now I have a list of for example
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
I want to remove the words with the repeated letters, in which I want to remove the words
'aa','aac','bbb','bcca','ffffff'
Maybe import re
?
The original version of this question wanted to drop words that consist entirely of repetitions of a single character. An efficient way to do this is to use sets. We convert each word to a set, and if it consists of only a single character the length of that set will be 1. If that's the case, we can drop that word, unless the original word consisted of a single character.
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
newdata = [s for s in data if len(s) == 1 or len(set(s)) != 1]
print(newdata)
output
['dog', 'cat', 'a', 'aac', 'bcca']
Here's code for the new version of your question, where you want to drop words that contain any repeated characters. This one's simpler, because we don't need to make a special test for one-character words..
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
newdata = [s for s in data if len(set(s)) == len(s)]
print(newdata)
output
['dog', 'cat', 'a']
If the repetitions have to be consecutive, we can handle that using groupby
.
from itertools import groupby
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff', 'abab', 'wow']
newdata = [s for s in data if max(len(list(g)) for _, g in groupby(s)) == 1]
print(newdata)
output
['dog', 'cat', 'a', 'abab', 'wow']
Thanks to this thread: Regex to determine if string is a single repeating character
Here is the re version, but I would stick to PM2 ring and Tameem's solutions if the task was as simple as this:
import re
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
[i for i in data if not re.search(r'^(.)\1+$', i)]
Output
['dog', 'cat', 'a', 'aac', 'bcca']
And the other:
import re
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
[i for i in data if not re.search(r'((\w)\2{1,})', i)]
Output
['dog', 'cat', 'a']
Loop is the way to go. Forget about sets so far as they do not work for words with repetitive letters.
Here is a method you can use to determine if word is valid in a single loop:
def is_valid(word):
last_char = None
for i in word:
if i == last_char:
return False
last_char = i
return True
Example
In [28]: is_valid('dogo')
Out[28]: True
In [29]: is_valid('doo')
Out[29]: False
Here's a way to check if there are consecutive repeated characters:
def has_consecutive_repeated_letters(word):
return any(c1 == c2 for c1, c2 in zip(word, word[1:]))
You can then use a list comprehension to filter your list:
words = ['dog','cat','a','aa','aac','bbb','bcca','ffffff', 'abab', 'wow']
[word for word in words if not has_consecutive_repeated_letters(word)]
# ['dog', 'cat', 'a', 'abab', 'wow']
One line is all it takes :)
data = ['dog','cat','a','aa','aac','bbb','bcca','ffffff']
data = [value for value in data if(len(set(value))!=1 or len(value) ==1)]
print(data)
Output
['dog', 'cat', 'a', 'aac', 'bcca']