0

I am trying to find a way of remove all the occurences of an item in a list in Python. To do so, imagine that my list is:

foo_list = [1,2,3,4,2,3]

And let's suppose I am trying to get rid of the item 2. If I use the .remove method, it will just delete the first 2 in my list.

foo_list.remove(2)

Will have as output [1,3,4,2,3] but I would like to have as output [1,3,4,3]. Of course I can do so using a comprehension list such as:

[item for item in foo_list if item !=2]

I could also do set(foo_list) but I do want to keep the replicates elements that are not the selected one, 2 in this case.

But I am trying to search for a way to do it without the need of a for loop as my real list has more than 100000 items, which it's making this procedure really slow. Is there any method similar to remove that would allow me to delete all the selected items?

Any help would be appreciated.

Marisa
  • 1,135
  • 3
  • 20
  • 33
  • 1
    I'm pretty sure that a list comprehension will be significantly faster than repeatedly calling `remove` on a list, and I can't think of any other ways to get rid of all occurences of a list element. – Aran-Fey Sep 20 '18 at 10:57
  • 2
    Whatever method is used, it will have to iterate the list at least once. List comprehension should be up there with any alternative. Also, `10000` is not that big a number. – user2390182 Sep 20 '18 at 10:57
  • 1
    Another option is to populate a numpy array and then run a mask /filter on the array – skibee Sep 20 '18 at 10:59
  • 1
    Possible duplicate of [Removing duplicates in lists](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists) – bobah Sep 20 '18 at 11:04

5 Answers5

1

You could always use filter, but it won't be any faster than a list comprehension.

list(filter(lambda x: x != 2, foo_list))

Let's look at some timings using IPython

import random

# make a large list of ints
bar_list = [random.randint(1,10000) for _ in range(100000)]

%timeit list(filter(lambda x: x != 2, bar_list))
100 loops, best of 3: 10.3 ms per loop

%timeit [x for x in bar_list if x != 2]
100 loops, best of 3: 4.34 ms per loop

List comprehension is about twice as fast compared to using filter

James
  • 32,991
  • 4
  • 47
  • 70
1

Edit: (optimize the list comprehension performance)

In order to optimize the list comprehension on this example, given that the 'lookup' list with the words to be removed are unique it can be transformed prior into a set to improve lookup's performance during the list comprehension.

def remove_all_from_other_list(_list, _remove_list):
    _remove_list = set(_remove_list)
    return [v for v in _list if v not in _remove_list]

Check this gist: https://gist.github.com/fsschmitt/4b2c8963485e46b4483746624b5a2bff

To check the performance differences between all of the solutions presented here.

Summary:

  • list comprehension: 55.785589082 seconds.

  • list comprehension with set: 17.348955028000006 seconds.

  • list filtering: 79.495240288 seconds.

  • for cycles: 70.14259565200001 seconds.


The easy way and comparably with better performance to remove the duplicates would be through list comprehension.

def remove_all(_list, value):
    return [v for v in _list if v != value]

Although you can always take advantage of the filter method:

def remove_all(_list, value):
    return list(filter(lambda v: v != value, _list))

Usage:

>>> remove_all([1, 2, 3, 4, 2, 3], 2)
[1, 3, 4, 3]

It will be definitely more performant than invoking the '.remove' method multiple times and verifying if there are still occurrences each time.

Let me know the specificity of the 'avoid list comprehension' decision so that I can think of another workaround if needed.

fsschmitt
  • 599
  • 5
  • 10
  • I have a dataframe where each file is a list and each item of them is a word, I would like to remove all the stopwords from each of the lists. Each of the lists can contain up to 10000 words and I have a total of 40k rows in my dataframe, therefore, 40k lists on which I would need to do the list comprehension. The execution time is extremely low, so I thought that if I could remove all the items at once, it could be faster. – Marisa Sep 20 '18 at 11:34
  • I have created a quick script to measure the performance difference between using the list comprehension, filtering and for cycles. https://gist.github.com/fsschmitt/2c1111f93e2b75e0e30545545796e483 Hopefully will shed some light. In the meantime I can try to find a similar dataset to check the results. – fsschmitt Sep 20 '18 at 15:29
  • @Marisa, do you need to maintain order and repetition of words? (e.g. word 'floor' showing up twice in your set) – fsschmitt Sep 20 '18 at 15:52
  • Yes, I need to repeat their repetition. I just want to delete words that are giving in a corpus, if the word is not in this corpus I would like to keep it and all its repetition. Order is not important – Marisa Sep 20 '18 at 15:55
  • @Marisa Might have found a sweet spot that optimizes the list comprehension. Check this new gist: https://gist.github.com/fsschmitt/4b2c8963485e46b4483746624b5a2bff I'll update my answer to reflect this finding. Feel free to run some tests with it, and if it improves your performance, accept the answer so other people can quickly find it! ;) – fsschmitt Sep 20 '18 at 16:15
  • Hi @Marisa, did that help you out? – fsschmitt Sep 25 '18 at 13:43
0

You can use a filter with a lambda like filter(lambda x: x!=2,foo_list)

Abhishek Ranjan
  • 157
  • 2
  • 7
0

The only problem I see by using a comprehension list is that you'll basically store both on memory for a short time.

You can try this:

def remove_repeated_elements(element, list_):
    try:
        while True:
            list_.remove(list_.index(element))
    except ValueError:
        pass
David Garaña
  • 915
  • 6
  • 8
0

remove() can only remove 1st occurrence of the element. I don't have much Idea about time but you can try this:

foo_list = [1,2,3,4,2,3]
while 2 in foo_list: foo_list.remove(2)
print(foo_list)
Nirali Khoda
  • 388
  • 5
  • 19