92

I am curious what would be an efficient way of uniquifying such data objects:

testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]

For each data pair, left numeric string PLUS the type at the right tells the uniqueness of a data element. The return value should be a list of lists as same as the testdata, but with only the unique values kept.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Hellnar
  • 62,315
  • 79
  • 204
  • 279
  • 2
    you were dealing with Ether 10 years ago? Wow! How did you know about them back then! – rsc05 Aug 24 '22 at 07:20

7 Answers7

164

You can use a set:

unique_data = [list(x) for x in set(tuple(x) for x in testdata)]

You can also see this page which benchmarks a variety of methods that either preserve or don't preserve order.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • Do note that you lose the ordering with this method. If it's relevant than you'll have to sort it after or remove the items manually. – Wolph Sep 16 '10 at 07:31
  • 1
    I am getting an error: `TypeError: unhashable type: 'list'`. Python 2.6.2, Ubuntu Jaunty. – Manoj Govindan Sep 16 '10 at 07:31
  • @Hellnar: he just updated the code to use a tuple, now you won't get that problem anymore :) – Wolph Sep 16 '10 at 07:32
  • 1
    @Manoj Govindan: The problem occurs because lists aren't hashable and only hashable types can be used in a set. I have fixed it by converting to tuples and then converting back to a list afterwards. Probably though the OP should be using a list of tuples. – Mark Byers Sep 16 '10 at 07:35
  • 1
    @Khan: Python sets are unordered. That doesn't mean you won't get a consistent result from `list(some_set)` but it means that you cannot set or influence the sort order in any way. For more info: https://stackoverflow.com/questions/12165200/order-of-unordered-python-sets – Wolph Mar 03 '19 at 00:22
  • @Wolph: Replace `set` with `dict.fromkeys`, and leave everything else the same, and on CPython/PyPy 3.6+ (or any Python 3.7+), you'll preserve order (the first copy of each duplicated value is kept in the original order, subsequent duplicates are discarded). – ShadowRanger Mar 09 '21 at 02:34
11

I tried @Mark's answer and got an error. Converting the list and each elements into a tuple made it work. Not sure if this the best way though.

list(map(list, set(map(lambda i: tuple(i), testdata))))

Of course the same thing can be expressed using a list comprehension instead.

[list(i) for i in set(tuple(i) for i in testdata)]

I am using Python 2.6.2.

Update

@Mark has since changed his answer. His current answer uses tuples and will work. So will mine :)

Update 2

Thanks to @Mark. I have changed my answer to return a list of lists rather than a list of tuples.

Manoj Govindan
  • 72,339
  • 21
  • 134
  • 141
5

Use unique in numpy to solve this:

import numpy as np

np.unique(np.array(testdata), axis=0)

Note that the axis keyword needs to be specified otherwise the list is first flattened.

Alternatively, use vstack:

np.vstack({tuple(row) for row in testdata})
Shaido
  • 27,497
  • 23
  • 70
  • 73
  • 1
    This option is great because it doesn't limit you to a tuple. You can find unique list of lists of multiple attributes. – wunderkind Mar 07 '23 at 03:30
3

Expanding a bit on @Mark Byers solution, you can also just do one list comprehension and typecast to get what you need:

testdata = list(set(tuple(x) for x in testdata))

Also, if you don't like list comprehensions as many find them confusing, you can do the same in a for loop:

for i, e in enumerate(testdata):
    testdata[i] = tuple(e)
testdata = list(set(testdata))
Sam Morgan
  • 2,445
  • 1
  • 16
  • 25
2
import sets
testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']]
conacatData = [x[0] + x[1] for x in testdata]
print conacatData
uniqueSet = sets.Set(conacatData)
uniqueList = [ [t[0:-3], t[-3:]] for t in uniqueSet]
print uniqueList
pyfunc
  • 65,343
  • 15
  • 148
  • 136
1

if you have a list of objects than you can modify @Mark Byers answer to:

unique_data = [list(x) for x in set(tuple(x.testList) for x in testdata)]

where testdata is a list of objects which has a list testList as attribute.

Khan
  • 1,418
  • 1
  • 25
  • 49
1

I was about to post my own take on this until I noticed that @pyfunc had already come up with something similar. I'll post my take on this problem anyway in case it's helpful.

testdata =[ ['9034968', 'ETH'], ['14160113', 'ETH'], ['9034968', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15724032', 'ETH'], ['15481740', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['10307528', 'ETH'], ['15481757', 'ETH'], ['15481724', 'ETH'], ['15481740', 'ETH'], ['15379365', 'ETH'], ['11111', 'NOT'], ['9555269', 'NOT'], ['15379365', 'ETH']
]
flatdata = [p[0] + "%" + p[1] for p in testdata]
flatdata = list(set(flatdata))
testdata = [p.split("%") for p in flatdata]
print(testdata)

Basically, you concatenate each element of your list into a single string using a list comprehension, so that you have a list of single strings. This is then much easier to turn into a set, which makes it unique. Then you simply split it on the other end and convert it back to your original list.

I don't know how this compares in terms of performance but it's a simple and easy-to-understand solution I think.

Lou
  • 2,200
  • 2
  • 33
  • 66