0

I have a python script which parses a .txt file and generates a list as follows:

['test=testTC101', 'test=testTC101', 'test=testTC102', 'test=testTC102', 'test=testTC103', 'test=testTC103', 'test=testTC104', 'test=testTC104', 'test=testTC105', 'test=testTC105', 'test=testTC106', 'test=testTC106', 'test=testTC107', 'test=testTC107']

I need to eliminate the duplicates. How can I achieve that?

Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
Urmi
  • 33
  • 4

2 Answers2

3

Just use set:

>>> x = ['test=testTC101', 'test=testTC101', 'test=testTC102', 'test=testTC102', 
...      'test=testTC103', 'test=testTC103', 'test=testTC104', 'test=testTC104', 
...      'test=testTC105', 'test=testTC105', 'test=testTC106', 'test=testTC106', 
...      'test=testTC107', 'test=testTC107']
>>> set(x)
set(['test=testTC101', 'test=testTC103', 'test=testTC102', 'test=testTC105', 
'test=testTC104', 'test=testTC107', 'test=testTC106'])
>>>

You can then always make it back into a list using list:

>>> list(set(x))
['test=testTC101', 'test=testTC103', 'test=testTC102', 'test=testTC105', 
'test=testTC104', 'test=testTC107', 'test=testTC106']
>>> # You can also use sorted to order the items
>>> sorted(list(set(x)))
['test=testTC101', 'test=testTC102', 'test=testTC103', 'test=testTC104', 
'test=testTC105', 'test=testTC106', 'test=testTC107']
>>>

Sets are guaranteed to not have duplicates.

1

Should you happen to require that order is preserved:

def best_case_iteruniq(iterable, key=None):
    memo_hash = set()
    memo_else = list()

    add_hash = memo_hash.add
    add_else = memo_else.append

    if key is None:
        def predicate(item):
            try:
                hash(item)

            except TypeError:
                if item in memo_else:
                    return False

                add_else(item)
                return True

            else:
                if item in memo_hash:
                    return False

                add_hash(item)
                return True


    else:
        def predicate(actual_item):
            item = key(actual_item)

            try:
                hash(item)

            except TypeError:
                if item in memo_else:
                    return False

                add_else(item)
                return True

            else:
                if item in memo_hash:
                    return False

                add_hash(item)
                return True

    return filter(predicate, iterable)

This is a lil' tool I made some time ago that allows you to uniquify lists of both hashable and non-hashable elements in the minimum possible time (assuming that there are no totally-ordered non-hashable elements), and it also allows a key to uniquify them with.

It's almost definitely overkill for this scenario, though. But it's there, and it's free.

Veedrac
  • 58,273
  • 15
  • 112
  • 169