Fastest way to remove 'directly' repeated items in a Python list?

Question

There are several questions on removing duplicate items from a list, but I am looking for a fast way to remove 'directly' repeated entries from a list:

myList = [1, 2, 3, 3, 2, 4, 4, 1, 4]

should become:

myList = [1, 2, 3, 2, 4, 1, 4]

So the entries which are directly repeated should 'collapse' to a single entry.

I tried:

myList = [1, 2, 3, 3, 2, 4, 4, 1, 4]

result = []
for i in range(len(myList)-1): 
    if(myList[i] != myList[i+1]):
        result.append(myList[i])
if(myList[-1] != myList[-2]):
    result.append(myList[-1])

print(result)

Which seems to work, but it's a little ugly (how it deals with the end, and large).

I'm wondering if there is a better way to do this (shorter), and more importantly if there is a faster way to do this.

score 1 · Answer 1 · answered Mar 27 '23 at 10:04

1

I think we can use list comprehension

myList = [1, 2, 3, 3, 2, 4, 4, 1, 4]

newList = [myList[i] for i in range(len(myList)) if i == 0 or myList[i] != 
          myList[i-1]]

print(newList)  # Output: [1, 2, 3, 2, 4, 1, 4]

answered Mar 27 '23 at 10:04

DecoderS

64
6

I see we both ran into the issue of what happens when the first and last list item are the same - your solution for this seems a lot more elegant than mine was :) – alexhroom Mar 27 '23 at 10:06
@Ake: Calling `len` on a Python list is constant time. It's not like a Lisp-style cons-based linked list or something like that, where you would have to traverse the list to determine its length. Python lists are dynamic arrays. – user2357112 Mar 27 '23 at 10:19

alexhroom · Answer 2 · 2023-03-27T10:13:53.440

0

Here's a more concise version of your implementation:

def shorten_list(input_list):
    output_list = []
    output_list.append(input_list[0])
    for i in input_list[1:]:
        if output_list[-1] != i:
            output_list.append(i)

    return output_list

EDIT: As pointed out, this list comprehension gets very slow. DecoderS' answer gives a better list comprehension.

or a list comprehension:

[my_list[i] for i in range(len(my_list)) if my_list[i] != ([None]+my_list)[i]]

which "right-shifts" the list by one, and compares it to the original list. This method works more robustly than if my_list[i] != my_list[i-1] as the latter would fail on the list [1, 1, 1] (where in the first position it compares item 0 to item -1, both of which are 1)

edited Mar 27 '23 at 10:13

answered Mar 27 '23 at 10:03

alexhroom

173
8

Your list comprehension recomputes `[None]+my_list` on every iteration, making it catastrophically slow as the input size increases. – user2357112 Mar 27 '23 at 10:11
@user2357112 I agree- user DecoderS has done the same thing more efficiently! – alexhroom Mar 27 '23 at 10:12

score 0 · Accepted Answer · answered Mar 27 '23 at 10:05

0

This is one of the uses of the standard itertools.groupby function:

import itertools

deduplicated_list = [item for (item, group) in itertools.groupby(myList)]

answered Mar 27 '23 at 10:05

user2357112

260,549
28
431
505

Fastest way to remove 'directly' repeated items in a Python list?

3 Answers3