17

I was wondering, if there is way in Python to modify collections without creating new ones. E.g.:

lst = [1, 2, 3, 4, 5, 6]
new_lst = [i for i in lst if i > 3]

Works just fine, but a new collection is created. Is there a reason, that Python collections lack a filter() method (or similar) that would modify the collection object in place?

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
Zaur Nasibov
  • 22,280
  • 12
  • 56
  • 83
  • If you really, absolutely *must* modify it in place, why not examine each value, and pop(i) the ones you dislike? – Gustav Bertram Nov 07 '11 at 14:14
  • 1
    On the "lack" of other methods, it's because in Python "There should be one-- and preferably only one --obvious way to do it." The list slicing operation in the answer below is the preferred way of doing in-place modification. It should come naturally orthogonal since lst[index] accesses a single element, lst[start:stop] accesses a span/slice of elements. – shimofuri Nov 07 '11 at 14:17
  • 1
    BasicWolf In fact you're right: as far as I know, there is no method or function that processes an in-place transformation and the question "why so ?" is valid. The fact that it's possible to write snippets that do such transformations isn't a justifying reason, otherwise it would be sufficient that we can write our own snippets to do reversing of sequences to justify that there wouldn't be reversed() as a built-in function. But there is reversed() in built-in features.... I upvote because your question comes to appear as stimulating reflection. – eyquem Nov 07 '11 at 18:07
  • @eyquem I think the reason for that is straigthforward. Since their isn't an efficient way to filter a Python `list` in-place, no way to do it is provided -- it would encourage people to try and do it in-place when it's better not to. There _is_ however a very efficient way to iterate over it in reverse, so a solution is provided. – agf Nov 09 '11 at 23:14
  • Still, there are lots of unordered collections. E.g. filtering a `set` or a `dict`. – Zaur Nasibov Nov 10 '11 at 07:39

7 Answers7

31

If you want to do this in place, just use

lst[:] = [i for i in lst if i > 3]

This won't be faster or save any memory, but it changes the object in place, if this is the semantics you need.

Community
  • 1
  • 1
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • So what is the point in this code? `lst = [...]` has the same effect, hasn't it? – Zaur Nasibov Nov 07 '11 at 14:04
  • 3
    If you are inside a function and don't want to return the new list for some reason you need to modify it in-place so the changes are available outside. – ThiefMaster Nov 07 '11 at 14:05
  • 4
    @BasicWolf: the main difference is that this does not allocate a new list, so if the list is shared between clients, it is modified everywhere. – Fred Foo Nov 07 '11 at 14:06
  • It's strange that you explained in your other cited answer that the right member of the assignement instruction is first evaluated (hence a new object is created elsewhere in the memory) and that in this answer your write that this instruction changes the list in place. It is not pure in-place changing , in my opinion. Saying that ``id(lst)`` remains the same after the assignement than before would be a more correct description but wouldn't correspond to the question anyway, alas – eyquem Nov 07 '11 at 16:10
  • @eyquem: In the linked question, the emphasis is on performance and memory consumption. In this question, the emphasis is on semantics. Semantically, this is an in-place operation, but in terms of memory consumption, it behaves like an out-of-place operation. I think this becomes clear from reading the answers. – Sven Marnach Nov 07 '11 at 16:32
  • In the linked answer, what interests me relatively to the present question is that it describes precisely the fact that another object is always created when using the slicing notation at the left side of an assignement instruction, not the fact that it was concerning memory consumption. It is funny, but this fact in your own linked answer proves that your own present answer ``lst[:] = [i for i in lst if i > 3]``, as the one of Ignacio Vazquez-Abrams, doesn't provide a way to do an in-place transformation, since the outer space out of the list is contributing to the mechanism. – eyquem Nov 07 '11 at 17:46
  • Now , it seems to me that the problem lies in the difference of understanding the phrase "in-place". I understand it as "the process is purely in-place", and you,Ignacio and agf seem to see it as meaning "the result remains in-place". But excuse me, I don't see what the semantics has to do in the matter. – eyquem Nov 07 '11 at 17:50
  • 2
    @eyquem I'd describe this as in-place, using O(n) extra storage. What you're describing as "purely in-place", like my `deque` example, I'd call in-place using O(1) extra storage. I think "in-place" applies to both; sometimes people just need to mutate an existing object when they ask for in-place, sometimes they are trying to minimize memory use. – agf Nov 09 '11 at 23:12
14

The other answers are correct; if you want all the names pointing to the old list to point to the new list you can use slice assignment.

However, that's not truly in-place creation; the new list is first created elsewhere. The link in Sven's answer is good.

The reason there isn't one that truly operates in-place is that while making a new list like that is O(n), each truly in-place item removal would be O(k) by itself, where k is the length of the list from the removal point on. The only way to avoid that with Python lists is to use some temporary storage, which is what you're doing by using slice assignment.

An example of an in-place O(n) filter on a collections.deque, in case you don't need to store your data in a list:

from collections import deque

def dequefilter(deck, condition):
    for _ in xrange(len(deck)):
        item = deck.popleft()
        if condition(item):
            deck.append(item)

deck = deque((1, 2, 3, 4, 5))
dequefilter(deck, lambda x: x > 2) # or operator.gt(2)
print deck
# deque([3, 4, 5])
agf
  • 171,228
  • 44
  • 289
  • 238
  • Thank you very much, agf (and the other guys :) I was really wondering about the reason. – Zaur Nasibov Nov 07 '11 at 14:08
  • 4
    It would be possible to implement a low-level in-place filter function which takes linear time, by using read *and* write pointers into the list. – Fred Foo Nov 07 '11 at 14:09
  • Btw, in case of linked lists - filtering will take O(n) for the whole list. I suppose you meant O(k) for vector-like structures? – Zaur Nasibov Nov 07 '11 at 14:12
  • @BasicWolf, Python's lists *are* vector-like structures, so the statement about O(k) is true for Python lists and a naive delete every element we don't want approach. There are no native linked-list structures in Python; you can obviously code them if you need them but using the built-in types is almost always preferable. – Duncan Nov 07 '11 at 14:26
  • @BasicWolf Duncan's got it. You asked about `list`s, that's the answer for `list`s. – agf Nov 07 '11 at 14:35
  • @Duncan `collections.deque` has constant time left appends / pops so you could implement an O(n) in-place filter with it, but the situations where that would be needed vs. a slice-assigned list comprehension are probably few and far between. I've added an example to my answer just as an exercise. – agf Nov 07 '11 at 14:49
  • @agf, Nice. As you say though it is hard to think of a situation where that would actually be preferred over the simple slice assignment. Slice assigning also has the advantage that the data structure is always valid. i.e. either unfiltered or filtered. The dequeue or other techniques such as using separate read/write pointers leave you with inconsistent data during the process. That may not matter, but since by definition you only case about in-place when there are multiple references to the data it's a potential pitfall. – Duncan Nov 07 '11 at 15:55
  • 1
    @agf: The slice assignment is a single byte-code instruction. When using CPython with threading, it will be atomic due to the GIL. – Sven Marnach Nov 07 '11 at 20:57
  • @SvenMarnach Good to know. For some reason I assumed it wouldn't be a single (Python level) instruction. – agf Nov 07 '11 at 20:59
  • To expand on FredFoo's comment, it's possible to implement it in a high level language (Python) too (without any additional memory usage) by maintaining the read/write indices; however code implemented in Python is usually slower than in C (*note*: not benchmarked), so `[:]` is good if you must keep the original reference for [some reasons](https://stackoverflow.com/questions/8037455/how-to-modify-python-collections-by-filtering-in-place#comment9840496_8037476). – user202729 Dec 21 '20 at 05:50
4

Maybe I'm slightly late, but since no other "O(n) time/O(1) memory" solutions have been posted, and some people even claimed that it is impossible, I think I should post this.

# Retains the elements of xs for which p returned true 
def retain(xs, p):
    w = 0
    for x in xs:
        if p(x):
            xs[w] = x
            w += 1
    del xs[w:]
Vladislav
  • 41
  • 1
4

Correcting @larsmans original solution, you could either do

    i = 0
    while i < len(lst):
        if lst[i] <= 3:
            del lst[i]
        else:
            i += 1

or

    i = len(lst)
    while i > 0:
        if lst[i-1] <= 3:
            del lst[i-1]
        i -= 1

The reason is the "index shift" which happens with the del. If I del at a certain index, that index needs to be re-examined because it now holds a different value.

glglgl
  • 89,107
  • 13
  • 149
  • 217
  • While this _corrects_ larsman's solution, it doesn't _improve_ it -- it's still quadratic time. – agf Nov 07 '11 at 14:49
  • hm, that depends on the definition. It does not improve its performants, but its usability. A correctly working solution is a better one than a faulty one, and IMHO "improve" mainly means "make better". – glglgl Nov 07 '11 at 14:53
  • 2
    Sometimes, I don't understand people. At this minute I write, Sven's answer, that isn't an exact answer for the question, has been upvoted 7 times, while this glglgl's answer that gives a correct answer in the same time it corrects another erroneous answer has no upvote. Moreover, beginning with ``i = len(lst]`` and decrementing **i** avoids to computes ``len(lst)`` at each turn of the loop, and that's tricky. Instead of being upvoted, this good answer receives a critical remark that is not in cause in the question (because concerning the speed). Personally , I evidently +1 – eyquem Nov 07 '11 at 17:05
1

The lst[:] solution by @Sven Marnach is one option. You can also perform this operation in-place, using constant extra memory, with

>>> i = 0
>>> while i < len(lst):
...  if lst[i] <= 3:
...   del lst[i]
...  else:
...   i += 1
... 
>>> lst
[4, 5, 6]

... but this solution is not very readable and takes quadratic time due to all the element shifting involved.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • But each `del lst[i]` takes linear time, which is why this doesn't exist by default. – agf Nov 07 '11 at 14:06
  • @agf: where I wrote constant the second time, I meant quadratic. Updated, thanks. – Fred Foo Nov 07 '11 at 14:08
  • 1
    When I try this with the original list, my resulting list becomes `[2, 4, 5, 6]`. Writing an own answer. – glglgl Nov 07 '11 at 14:18
  • 1
    It took me 10 minutes to understand that this answer had been edited after glglgl's comment without signaling it :( – eyquem Nov 07 '11 at 16:54
  • Thx for hint - I refer now to the old version of his post, honouring his edit. – glglgl Nov 07 '11 at 17:15
0

Because it's not needed.

lst[:] = [i for i in lst if i > 3]
Community
  • 1
  • 1
Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
0

I think it's in place transformation;

lst = [1,2,3,4,5,6,7,8,9,10,11]
to_exclude = [8,4,11,9]
print 'lst == %s\nto_exclude == %s' % (lst,to_exclude)

for i in xrange(len(lst)-1,-1,-1):
    if lst[i] in to_exclude:
        lst.pop(i)

print '\nlst ==',lst

result

lst == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
to_exclude == [8, 4, 11, 9]

lst == [1, 2, 3, 5, 6, 7, 10]
eyquem
  • 26,771
  • 7
  • 38
  • 46