Remove similar items in a list in Python

Question

How do you remove similar items in a list in Python but only for a given item. Example,

l = list('need')

If 'e' is the given item then

l = list('nd')

The set() function will not do the trick since it will remove all duplicates.

count() and remove() is not efficient.

You mean how to remove duplicate letters? The question is not clear enough, what dows "similar items" mean? — Jon Nezbit, Aug 16 '20 at 00:31
Are you only wanting to remove a 'given item' if it is also a duplicate? What if the given removal letter is `'e'` what would you expect with `'nedd'`? — dawg, Aug 16 '20 at 00:53
My question is clear enough, because I gave the input and what is the expected output. — Djangonow, Aug 16 '20 at 01:39

or hayat · Accepted Answer · 2020-08-16T00:43:33.813

2

use filter

assuming you write function that decide on the items that you want to keep in the list.

for your example

 def pred(x):
     return x!="e"
 l=list("need")
 l=list(filter(pred,l))

edited Aug 16 '20 at 00:43

answered Aug 16 '20 at 00:40

or hayat

51
5

score 0 · Answer 2 · answered Aug 16 '20 at 00:34

0

Assuming given = 'e' and l= list('need').

for i in range(l.count(given)):
    l.remove(given)

answered Aug 16 '20 at 00:34

Ari K

434
2
18

`count()` will count the `n` number of occurrences of an item in your list, allowing you to loop through the list with `remove()` `n` times. – Ari K Aug 16 '20 at 00:36
OP said count and remove are inefficient – hedy Aug 16 '20 at 00:37
that wasn't in the question when i first answered it lol – Ari K Aug 16 '20 at 00:37

score 0 · Answer 3 · answered Aug 16 '20 at 01:07

If you just want to replace 'e' from the list of words in a list, you can use regex re.sub(). If you also want a count of how many occurrences of e were removed from each word, then you can use re.subn(). The first one will provide you strings in a list. The second will provide you a tuple (string, n) where n is the number of occurrences.

import re
lst = list(('need','feed','seed','deed','made','weed','said'))
j = [re.sub('e','',i) for i in lst]
k = [re.subn('e','',i) for i in lst]

The output for j and k are :

j = ['nd', 'fd', 'sd', 'dd', 'mad', 'wd', 'said']
k = [('nd', 2), ('fd', 2), ('sd', 2), ('dd', 2), ('mad', 1), ('wd', 2), ('said', 0)]

If you want to count the total changes made, just iterate thru k and sum it. There are other simpler ways too. You can simply use regEx

re.subn('e','',''.join(lst))[1]

This will give you total number of items replaced in the list.

TheLazyScripter · Answer 4 · 2020-08-16T02:48:36.720

List comprehension Method. Not sure if the size/complexity is less than that of count and remove.

def scrub(l, given):
    return [i for i in l if i not in given]

Filter method, again i'm not sure

def filter_by(l, given):
    return list(filter(lambda x: x not in given, l))

Bruteforce with recursion but there are a lot of potential downfalls. Still an option. Again I don't know the size/comp

def bruteforce(l, given):
    try:
        l.remove(given[0])
        return bruteforce(l, given)
    except ValueError:
        return bruteforce(l, given[1:])
    except IndexError:
        return l
    return l

For those of you curious as to the actual time associated with the above methods, i've taken the liberty to test them below!

Below is the method I've chosen to use.

def timer(func, name):
    print("-------{}-------".format(name))
    try:
        start = datetime.datetime.now()
        x = func()
        end = datetime.datetime.now()
        print((end-start).microseconds)
    except Exception, e:
        print("Failed: {}".format(e))
    print("\r")

The dataset we are testing against. Where l is our original list and q is the items we want to remove, and r is our expected result.

l = list("need"*50000)
q = list("ne")
r = list("d"*50000)

For posterity I've added the count / remove method the OP was against. (For good reason!)

def count_remove(l, given):
    for i in given:
        for x in range(l.count(i)):
            l.remove(i)
    return l

All that's left to do is test!

timer(lambda: scrub(l, q), "List Comp")
assert(scrub(l,q) == r)

timer(lambda: filter_by(l, q), "Filter")
assert(filter_by(l,q) == r)

timer(lambda : count_remove(l, q), "Count/Remove")
assert(count_remove(l,q) == r)

timer(lambda: bruteforce(l, q), "Bruteforce")
assert(bruteforce(l,q) == r)

And our results

-------List Comp-------
10000

-------Filter-------
28000

-------Count/Remove-------
199000

-------Bruteforce-------
Failed: maximum recursion depth exceeded

Process finished with exit code 0

The Recursion method failed with a larger dataset, but we expected this. I tested on smaller datasets, and Recursion is marginally slower. I thought it would be faster.

Remove similar items in a list in Python

4 Answers4