How to find duplicate letters and remove all of them from a string in a list

Question

Given a list of names:

name_list = ['Jasonn', 'pPeter', 'LiSsa', 'Joanna']

I want to remove the same letters(case insensitive), say for name_list[0], it will be 'Jaso' and for name_list[3], it will be 'Jo' since after 'n's are removed, 'a's should also be removed.

Here's my code:

i = 0
for name in name_list:
    ind = name_list.index(name)
    length = len(name)
    for i in range(0,length-1):
        if name[i].lower() == name[i+1].lower():
            name = name_list[ind].replace(name[i], '', 1)
            name = name.replace(name[i], '', 1)
            length -= 2
            if i >= 1 and name[i].lower() == name[i-1].lower():
                name = name_list[ind].replace(name[i], '', 1)
                name = name.replace(name[i-1], '', 1)
        else:
            i += 1
    if ind != len(name_list): 
        print(sep,end='', sep='') #sep is my separator
print()

My code does not compile. It fails on this line:

if i >= 1 and name[i].lower() == name[i-1].lower():

with:

IndexError: string index out of range

I can't figure out why the range is wrong. My first thought was to check if the index is bigger than 0 so that i-1 would not be negative. For example, given the string 'pPeter', after I removed 'pP', I then just check the new letter 'e' for i = 0 and 't' for i+1 since there's no letter before index 0.

and for 'J[0]o[1]a[2]n[3]n[4]a[5]'

When i = 3, the 'n's for i and i+1 are removed. The string then becomes 'J[0]o[1]a[2]a[3]'.
Since i = 3 > 0 and both i-1 and i equals 'a', we remove the 'a's and generate 'Jo'.

Could someone help me figure out where I went wrong?

Would it work to put the `i = 0` inside the loop? Where it is, it's not going to reset for each name. — Ignatius Reilly, Jun 19 '22 at 14:43
@blunova That doesn't answer this question. They're similar, but they're not the same. — BrokenBenchmark, Jun 19 '22 at 14:47

score 3 · Answer 1 · answered Jun 19 '22 at 14:42

3

This approach looks unnecessarily complex.

Instead, you can keep track of the frequencies of every letter in the list. Then, retain only the letters that appear exactly once:

from collections import Counter

name_list = ['Jasonn', 'pPeter', 'LiSsa', 'Joanna']
result = []

for name in name_list:
    letter_freqs = Counter(name.lower())
    result.append(''.join(letter for letter in name if letter_freqs[letter.lower()] == 1))

print(result)

This outputs:

['Jaso', 'tr', 'Lia', 'Jo']

answered Jun 19 '22 at 14:42

BrokenBenchmark

18,126
7
21
33

Wow it really is a concise answer! But if could, I still like to know where did my original code go wrong, thanks! – luoluoluo Jun 20 '22 at 03:41
1

I'm not too sure, but even if it were to be fixed, it'd still be quite slow. Repeated replace operations can take a long time if the string is long. – BrokenBenchmark Jun 20 '22 at 03:44

score 1 · Answer 2 · answered Jun 19 '22 at 15:04

With regular expressions:

from re import sub, IGNORECASE

name_list = ['Jasonn', 'pPeter', 'LiSsa', 'Joanna']
result = []
for name in name_list:
    name2=name
    while True:
        name2=sub(r'(\w)(\1)', '', name, flags=IGNORECASE)
        if name2 == name:
            result.append(name2)
            break
        else:
            name = name2
print(result)
['Jaso', 'eter', 'Lia', 'Jo']

How to find duplicate letters and remove all of them from a string in a list

2 Answers2