5

My question is similar to this, but instead of removing full duplicates I'd like to remove consecutive partial "duplicates" from a list in python.

For my particular use case, I want to remove words from a list that start consecutive with the same character, and I want to be able to define that character. For this example it's #, so

['#python', 'is', '#great', 'for', 'handling', 
'text', '#python', '#text', '#nonsense', '#morenonsense', '.']

should become

['#python', 'is', '#great', 'for', 'handling', 'text', '.']
Moritz
  • 309
  • 6
  • 16

3 Answers3

5

You could use itertools.groupby:

>>> from itertools import groupby
>>> lst = ['#python', 'is', '#great', 'for', 'handling', 'text', '#python', '#text', '#nonsense', '#morenonsense', '.']    
>>> [s for k, g in ((k, list(g)) for k, g in groupby(lst, key=lambda s: s.startswith("#")))
...    if not k or len(g) == 1 for s in g]
...
['#python', 'is', '#great', 'for', 'handling', 'text', '.']

This groups elements by whether they start with a #, then uses only those elements that do not or where the group only has a single element.

tobias_k
  • 81,265
  • 12
  • 120
  • 179
3

Here's one solution using itertools.groupby. The idea is to group items depending on whether the first character is equal to a given k. Then apply your 2 criteria; if they are not satisfied, you can yield the items.

L = ['#python', 'is', '#great', 'for', 'handling', 'text',
     '#python', '#text', '#nonsense', '#morenonsense', '.']

from itertools import chain, groupby

def list_filter(L, k):
    grouper = groupby(L, key=lambda x: x[0]==k)
    for i, j in grouper:
        items = list(j)
        if not (i and len(items) > 1):
            yield from items

res = list_filter(L, '#')

print(list(res))

['#python', 'is', '#great', 'for', 'handling', 'text', '.']
jpp
  • 159,742
  • 34
  • 281
  • 339
1

One single iteration is enough, provided you keep some context: the previous element and whether pre-previous was kept.

def filter_lst(lst, char):
    res = []               # the future returned value
    keep = True            # initialize context
    old = lst[0]
    for word in lst[1:]:   # and iterate (first element is already in old)
        if old[0] != char or (keep and word[0] != char):
            res.append(old)
            keep = True
        else:
            keep = False
        old = word
    if keep or (old[0] != char):   # don't forget last element!
        res.append(old)
    return res

It gives:

>>> lst = ['#python', 'is', '#great', 'for', 'handling', 
       'text', '#python', '#text', '#nonsense', '#morenonsense', '.']
>>> filter_lst(lst, '#')
['#python', 'is', '#great', 'for', 'handling', 'text', '.']
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252