0

How do you remove similar items in a list in Python but only for a given item. Example,

l = list('need')

If 'e' is the given item then

l = list('nd')

The set() function will not do the trick since it will remove all duplicates.

count() and remove() is not efficient.

Djangonow
  • 413
  • 1
  • 4
  • 9

4 Answers4

2

use filter

assuming you write function that decide on the items that you want to keep in the list.

for your example

 def pred(x):
     return x!="e"
 l=list("need")
 l=list(filter(pred,l))
or hayat
  • 51
  • 5
0

Assuming given = 'e' and l= list('need').

for i in range(l.count(given)):
    l.remove(given)
Ari K
  • 434
  • 2
  • 18
0

If you just want to replace 'e' from the list of words in a list, you can use regex re.sub(). If you also want a count of how many occurrences of e were removed from each word, then you can use re.subn(). The first one will provide you strings in a list. The second will provide you a tuple (string, n) where n is the number of occurrences.

import re
lst = list(('need','feed','seed','deed','made','weed','said'))
j = [re.sub('e','',i) for i in lst]
k = [re.subn('e','',i) for i in lst]

The output for j and k are :

j = ['nd', 'fd', 'sd', 'dd', 'mad', 'wd', 'said']
k = [('nd', 2), ('fd', 2), ('sd', 2), ('dd', 2), ('mad', 1), ('wd', 2), ('said', 0)]

If you want to count the total changes made, just iterate thru k and sum it. There are other simpler ways too. You can simply use regEx

re.subn('e','',''.join(lst))[1]

This will give you total number of items replaced in the list.

Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33
0

List comprehension Method. Not sure if the size/complexity is less than that of count and remove.

def scrub(l, given):
    return [i for i in l if i not in given]

Filter method, again i'm not sure

def filter_by(l, given):
    return list(filter(lambda x: x not in given, l))

Bruteforce with recursion but there are a lot of potential downfalls. Still an option. Again I don't know the size/comp

def bruteforce(l, given):
    try:
        l.remove(given[0])
        return bruteforce(l, given)
    except ValueError:
        return bruteforce(l, given[1:])
    except IndexError:
        return l
    return l

For those of you curious as to the actual time associated with the above methods, i've taken the liberty to test them below!

Below is the method I've chosen to use.

def timer(func, name):
    print("-------{}-------".format(name))
    try:
        start = datetime.datetime.now()
        x = func()
        end = datetime.datetime.now()
        print((end-start).microseconds)
    except Exception, e:
        print("Failed: {}".format(e))
    print("\r")

The dataset we are testing against. Where l is our original list and q is the items we want to remove, and r is our expected result.

l = list("need"*50000)
q = list("ne")
r = list("d"*50000)

For posterity I've added the count / remove method the OP was against. (For good reason!)

def count_remove(l, given):
    for i in given:
        for x in range(l.count(i)):
            l.remove(i)
    return l

All that's left to do is test!

timer(lambda: scrub(l, q), "List Comp")
assert(scrub(l,q) == r)

timer(lambda: filter_by(l, q), "Filter")
assert(filter_by(l,q) == r)

timer(lambda : count_remove(l, q), "Count/Remove")
assert(count_remove(l,q) == r)

timer(lambda: bruteforce(l, q), "Bruteforce")
assert(bruteforce(l,q) == r)

And our results

-------List Comp-------
10000

-------Filter-------
28000

-------Count/Remove-------
199000

-------Bruteforce-------
Failed: maximum recursion depth exceeded

Process finished with exit code 0

The Recursion method failed with a larger dataset, but we expected this. I tested on smaller datasets, and Recursion is marginally slower. I thought it would be faster.

TheLazyScripter
  • 2,541
  • 1
  • 10
  • 19