0

I'd like to convert a list of strings to lowercase and remove duplicates while preserving the order. A lot of the single-line Python magic I've found on StackOverflow converts a list of strings to lowercase, but it seems the order is lost.

I've written the code below which actually works, and I'm happy to stick it with. But I was wondering if there is a way of doing it that is a lot more pythonic and less code (and potentially less buggy if I were to write something similar in the future. This one took me quite a while to write).

def word_list_to_lower(words):
    """ takes a word list with a special order (e.g. frequency)
    returns a new word list all in lower case with no uniques but preserving order"""

    print("word_list_to_lower")    
    # save orders in a dict
    orders = dict()
    for i in range(len(words)):
        wl = words[i].lower()

        # save index of first occurence of the word (prioritizing top value)        
        if wl not in orders:
            orders[wl] = i

    # contains unique lower case words, but in wrong order
    words_unique = list(set(map(str.lower, words)))

    # reconstruct sparse list in correct order
    words_lower = [''] * len(words)
    for w in words_unique:
        i = orders[w]
        words_lower[i] = w

    # remove blank entries
    words_lower = [s for s in words_lower if s!='']

    return words_lower
zondo
  • 19,901
  • 8
  • 44
  • 83
memo
  • 3,554
  • 4
  • 31
  • 36

4 Answers4

1

Slightly modifying the answer from How do you remove duplicates from a list in whilst preserving order?

def f7(seq):
    seen = set()
    seen_add = seen.add
    seq = (x.lower() for x in seq)
    return [x for x in seq if not (x in seen or seen_add(x))]
Community
  • 1
  • 1
Josh Wilson
  • 3,585
  • 7
  • 32
  • 53
  • How is `seen_add(...)` better than `seen.add(...)`? IMO, it's worse. – zondo Jun 12 '16 at 22:16
  • It would be a little more efficient if you used parentheses `()` instead of brackets `[]` when defining `seq`. That is because you create a generator that gives the values on demand instead of a list which needs to store in memory every value. – zondo Jun 12 '16 at 22:19
1

You can also do:

pip install orderedset

and then:

from orderedset import OrderedSet
initial_list = ['ONE','one','TWO','two','THREE','three']
unique_list =  [x.lower() for x in list(OrderedSet(initial_list))]

print unique_list
dmitryro
  • 3,463
  • 2
  • 20
  • 28
0

Just do something like:

initial_list = ['ONE','one','TWO','two']
uninique_list =  [x.lower() for x in list(set(initial_list))]

print unique_list
dmitryro
  • 3,463
  • 2
  • 20
  • 28
  • One of the key points in the question is that the order must be preserved. Your solution does not preserve the order. – zondo Jun 12 '16 at 22:20
0
initial_list = ['ONE','one','TWO','two']
new_list = []
[new_list.append(s.lower()) for s in initial_list if s.lower() not in new_list]
SuperNova
  • 25,512
  • 7
  • 93
  • 64