1

I have a list of duplicates which can have different capitalisation. For example:

li = ['Peter', 'PETER']

I tried:

[out.append(x) for x in li if x not in out]

Which works well for the same case capitalisation but not different.

Help would be appreciated.

Stanislav Jirak
  • 725
  • 1
  • 7
  • 22

5 Answers5

3

You can "normalize" each name by using str.title then use a set comprehension to narrow down to unique items

>>> names = ['Peter', 'PETER', 'peter', 'Tom', 'TOM', 'Beth', 'beth']
>>> {i.title() for i in names}
{'Tom', 'Beth', 'Peter'}
Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
  • 1
    This doesn't actually remove duplicates, just gets a list of unique names, all with title case. But it doesn't keep the original casing of the first item – tituszban Jun 28 '21 at 12:48
  • @tituszban And where did you interpret that requirement from the OP's question? And turning this set back into a list is trival afterwards. `names = list({i.title() for i in names})` that's it. The fact that it creates a new list instead of modifying the existing list is a semantic detail. – Cory Kramer Jun 28 '21 at 12:50
  • It's not tho. OP says remove duplicates. That can indicate keeping the first casing, assuming the casing does matter. By applying `.title()` you assume the OP wants all the names in that casing, which is not in the request. Your approach does two things at once, remove duplicates and set the casing of every result. OP only asked for the first. – tituszban Jun 28 '21 at 12:55
1

You need to check with uniform capitalisation:

li = ['Peter', 'PETER']

uniques = []

for l in li:
    if l.lower() in [u.lower() for u in uniques]:  # Check if current item (all lower case) is in the items already stored (each of them lower case)
        continue
    uniques.append(l)
tituszban
  • 4,797
  • 2
  • 19
  • 30
1

You could trasnsform your list to get duplicates and then try using set.

li2 = set([x.title() for x in li])

This will eliminate any duplicates

Jofre
  • 3,718
  • 1
  • 23
  • 31
birgador
  • 101
  • 4
  • This doesn't actually remove duplicates, just gets a list of unique names, all with title case. But it doesn't keep the original casing of the first item – tituszban Jun 28 '21 at 12:48
1

If you want to keep the first version:

s = set()
out = []

for x in li:
    if x.lower() not in s:
        out.append(x)
    else:
        s.add(x.lower())
user2390182
  • 72,016
  • 6
  • 67
  • 89
1

You should ignore capitalization when building your set. When you want to test inclusion in the set you should also ignore the case:

li = ['Peter', 'PETER']
li2 = {l.lower() for l in li}

name = "Peter"

if name.lower() in li2:
  ...
Louis Lac
  • 5,298
  • 1
  • 21
  • 36