The most efficient way is certainly to use sets for everything, as in Ronie Martinez's answer.
I'll take the liberty of quoting Ronie's code here, for ease of reference (although with a slight edit to use os.path.getsize
):
files = [...]
files_0 = {f for f in files if os.path.getsize(f) == 0}
files = set(files) - files_0
As an addendum, here is what you could do if you actually care about the ordering for some reason:
files = [...]
files_0 = [f for f in files if os.path.getsize(f) == 0]
Followed by either:
files = [f for f in files if file not in files_0] #
or better (because sets are much quicker than lists for testing inclusion):
files_0_set = set(files_0)
files = [f for f in files if f not in files_0_set]
In this case, although we don't specifically want to output the files_0_set
, it is a useful intermediate.
By the way, the original code attempts to remove items from a list while iterating over it. This will not work reliably, and may lead to items being missed. It is fine to use an explicit loop if desired, instead of a set comprehension or list comprehension, but the removal of items must be deferred until after the loop.
Additionally, Ronie is correct that removing items from the middle of a list is slow, and therefore it is likely to be faster simply to construct a new list which excludes certain items. However, one situation where you might not want to do that, and therefore should use remove
would be if files
is a view of a list that you are also using via other references, so you want that list to be updated, not just to reassign the files
variable. Here is an example of what you might do in that case. In this example, I will use an explicit loop when constructing files_0
for sake of clarity, although the list comprehension above could also still be used.
files_0 = []
for f in files:
if os.path.getsize(f) == 0:
files_0.append(f)
for f in files_0:
files.remove(f)
The point is that you are doing the removal in a separate loop after the end of the loop over files
, and this separate loop is over a different collection (in this case files_0
) so you are not removing items from the list that you are iterating over.