How to delete consecutive duplicates in a list of lists efficiently?

Question

I have a nested list:

l = [['GILTI', 'was', 'intended', 'to','to', 'stifle', 'multinationals'. 'was'],
    ['like' ,'technology', 'and', 'and','pharmaceutical', 'companies', 'like']]

How can I detect two consecutive elements and delete one without using set or another similar operation? This should be the desired output:

l = [['GILTI', 'was', 'intended','to', 'stifle', 'multinationals'. 'was'],
    ['like' ,'technology', 'and','pharmaceutical', 'companies', 'like']]

I tried using itertools groupby like this:

from itertools import groupby  
[i[0] for i in groupby(l)]

And also, an ordered dict:

from collections import OrderedDict

temp_lis = []
for x in l:
    temp_lis.append(list(OrderedDict.fromkeys(x)))
temp_lis

out:

[['GILTI', 'was', 'intended', 'to', 'stifle', 'multinationals'],
 ['like', 'technology', 'and', 'pharmaceutical', 'companies']]

The second solution might look that works well. However,it is wrong because it is deleting non consecutive repeated elements (eg was and like). How can I get the above desired output?

Austin · Accepted Answer · 2019-08-13T05:25:37.233

2

You can use groupby like so:

[[k for k, g in groupby(x)] for x in l]

This will keep one if there are multiple repeating consecutive elements.

In case you need to completely remove repetitive consecutive elements, use:

[[k for k, g in groupby(x) if len(list(g)) == 1] for x in l]

Example:

from itertools import groupby

l = [['GILTI', 'was', 'intended', 'to','to', 'stifle', 'multinationals', 'was'],
    ['like' ,'technology', 'and', 'and','pharmaceutical', 'companies', 'like']]

print([[k for k, g in groupby(x)] for x in l])

# [['GILTI', 'was', 'intended', 'to', 'stifle', 'multinationals', 'was'],
#  ['like', 'technology', 'and', 'pharmaceutical', 'companies', 'like']]

edited Aug 13 '19 at 05:25

answered Aug 13 '19 at 05:18

Austin

25,759
4
25
48

Thanks for the help again! What about a more particular solution? what if im just interested in removing ` 'to','to'` sequences? – aywoki Aug 13 '19 at 05:19
1

@aywoki, so you don't want both `'to'`s? – Austin Aug 13 '19 at 05:20
yes im just curios about how to iterate in that case. This solution solves the problem though – aywoki Aug 13 '19 at 05:23

Amadan · Answer 2 · 2019-08-13T05:35:36.583

2

A custom generator solution:

def deduped(seq):
    first = True
    for el in seq:
        if first or el != prev:
            yield el
            prev = el
            first = False

[list(deduped(seq)) for seq in l]
# => [['GILTI', 'was', 'intended', 'to', 'stifle', 'multinationals', 'was'], 
#     ['like', 'technology', 'and', 'pharmaceutical', 'companies', 'like']]

EDIT: The previous version couldn't handle None being the first element.

edited Aug 13 '19 at 05:35

answered Aug 13 '19 at 05:27

Amadan

191,408
23
240
301

2

`prev = object()` sentinel would also solve the first element issue – VPfB Aug 13 '19 at 06:33

bharatk · Answer 3 · 2019-08-13T05:24:42.207

enumerate() - method adds a counter to an iterable and returns it in a form of enumerate object.

Ex.

l = [['GILTI', 'was', 'intended','to', 'stifle', 'multinationals','was'],
    ['like' ,'technology', 'and','pharmaceutical', 'companies', 'like']]
result = []

for sublist in l:
    new_list = []
    for index,x in enumerate(sublist):
        #validate current and next element of list is same 
        if len(sublist)-1 >= index+1 and x == sublist[index+1]:
            continue
        #append none consecutive into new list
        new_list.append(x)
    #append list into result list
    result.append(new_list)

print(result)

O/P:

[['GILTI', 'was', 'intended', 'to', 'stifle', 'multinationals', 'was'], 
['like', 'technology', 'and', 'pharmaceutical', 'companies', 'like']]

How to delete consecutive duplicates in a list of lists efficiently?

3 Answers3