4

Let's assume I have a list, structured like this with approx 1 million elements:

a = [["a","a"],["b","a"],["c","a"],["d","a"],["a","a"],["a","a"]]

What is the fastest way to remove all elements from a that have the same value at index 0? The result should be

b = [["a","a"],["b","a"],["c","a"],["d","a"]]

Is there a faster way than this:

processed = []
no_duplicates = []

for elem in a:
    if elem[0] not in processed:
        no_duplicates.append(elem)
        processed.append(elem[0])

This works but the appending operations take ages.

dmuensterer
  • 1,875
  • 11
  • 26
  • 1
    I am not exactly sure of the behavior you want to have when a first index is encountered several times, but you could think of using `dict(a)` which will remove duplicates of the keys (i.e. the first element of each row). Then, it shouldn't be too difficult to go back from a dict to an array. – Arthur Bricq Oct 10 '22 at 11:54

3 Answers3

6

you can use set to keep the record of first element and check if for each sublist first element in this or not. it will took O(1) time compare to O(n) time to your solution to search.

>>> a = [["a","a"],["b","a"],["c","a"],["d","a"],["a","a"],["a","a"]]
>>> 
>>> seen = set()
>>> new_a = []
>>> for i in a:
...     if i[0] not in seen:
...             new_a.append(i)
...             seen.add(i[0])
... 
>>> new_a
[['a', 'a'], ['b', 'a'], ['c', 'a'], ['d', 'a']]
>>> 

Space complexity : O(N) Time complexity: O(N) Search if first element there or not : O(1)

In case, no new list to be declared, then use del element, but this will increase time complexity

sahasrara62
  • 10,069
  • 3
  • 29
  • 44
  • O(1) if you are reading through a list? – bn_ln Oct 10 '22 at 12:11
  • O(1) search time, to check if sublist[0] is already seen or not, O(N) time for to go through whole list `a`, so overall here time would be O(N). compare to OP time which is O(N*N) – sahasrara62 Oct 10 '22 at 12:15
  • OK, yes, O(1) for searching the set – bn_ln Oct 10 '22 at 12:34
  • It's worth noting that the append to `new_a` is also O(1) [ref](https://stackoverflow.com/questions/27073596/what-is-the-cost-complexity-of-insert-in-list-at-some-location) – bn_ln Oct 10 '22 at 12:40
  • Yes, better than using del and modify original list. – sahasrara62 Oct 10 '22 at 12:49
  • Thanks, that was exactly what I was looking for. Your solution for my dataset took about 5 sec, while mine took 6 min. Cheers to O(n) instead of O(n^2) – dmuensterer Oct 10 '22 at 13:30
3

You can use a dictionary comprehension using the first item as key. This will work in O(n) time.

out = list({x[0]: x for x in a}.values())

output: [['a', 'a'], ['b', 'a'], ['c', 'a'], ['d', 'a']]

mozway
  • 194,879
  • 13
  • 39
  • 75
1

You can use a list comprehension with a condition and add the first element back to the results like so

[a[0]] + [n for n in a if n != a[0]]

Output

[['a', 'a'], ['b', 'a'], ['c', 'a'], ['d', 'a']]
Patrick
  • 1,189
  • 5
  • 11
  • 19