-1

I have a large table couple of millions of pairs of ints [[1,2],[45,101],[22,222] etc..]. What is the quickest way in Python to remove duplicates ?

Creating empty list and appending it "if not in" doesn't work since it takes ages. Converting to Numpy and use "isin" I can't seem to get it to work on pairs.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • 2
    I see you have asked a few questions over the years, but as a refresher, please remember that this is **not a discussion forum**. We [do not want conversational language](https://meta.stackexchange.com/questions/2950) here. As for the question: is it required to use lists for the pairs? This question is trivial if they can be converted to tuples. `numpy.isin` won't help much, because the problem is the algorithm, not Python overhead. – Karl Knechtel Jun 03 '22 at 19:24
  • Please first read https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists. Note that to use the `set` approach, your elements don't have to be ints, but they do have to be *hashable* - sub-lists won't work (which is why I ask about converting to tuple first). If this is enough information to solve the problem, I can mark the question immediately as a duplicate; otherwise, please clarify what else you need to know. – Karl Knechtel Jun 03 '22 at 19:27

3 Answers3

2

you can do the following

arr = [[1,2],[45,101],[22,222], [1,2]]

arr = set(tuple(i) for i in arr)

if you want to convert it back to list

arr = [list(i) for i in arr]
Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
demetere._
  • 268
  • 2
  • 10
0

Probably going to be this: list(set(my_list))

Edit: Whoops. In any case, if whatever is iterating over said list can perform the task of detecting duplicates, that’d be the faster than removing duplicates beforehand.

thebadgateway
  • 433
  • 1
  • 4
  • 7
  • This will throw an error "unhashable type: list" because the OPs list is a list of lists. You'd have to first convert the list to a list of tuples. Afterwards you might want to convert the tuples back to lists. – Luatic Jun 03 '22 at 19:25
0

You could use np.unique():

np.unique([[1,2],[45,101],[22,222],[22,222]], axis=0)

Output:

array([[  1,   2],
       [ 22, 222],
       [ 45, 101]])

Note that this re-orders the list

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Nin17
  • 2,821
  • 2
  • 4
  • 14