If you want to keep both and you're okay with using a function, you could use:
def partition_on_index(it, indices):
indices = set(indices) # convert to set for fast lookups
l1, l2 = [], []
l_append = (l1.append, l2.append)
for idx, element in enumerate(it):
l_append[idx in indices](element)
return l1, l2
There's one trick involved here and it's the l_append[idx in indices]
. The idx in indices
will return a boolean representing if the condition is True
or False
. And because booleans are a subclass of integers in Python these can be interpreted as 0
(in case of False
) or 1
(if True
) and are thus valid indices for the l_append
tuple.
The l_append
tuple thus serves as convenient alternative for an if ... else ...
inside the loop and it hoists the lookup of the append
methods, thus improving the speed of the function a bit. So this is equivalent to:
def partition_on_index_probably_slower(it, indices):
indices = set(indices) # convert to set for fast lookups
l1, l2 = [], []
for idx, element in enumerate(it):
if idx in indices:
l2.append(element)
else:
l1.append(element)
return l1, l2
But let's see an example how the first function works:
>>> idx = [1,4,5]
>>> mylist = ['a','b','c','d','e','f']
>>> l1, l2 = partition_on_index(mylist, idx)
>>> l1
['a', 'c', 'd']
>>> l2
['b', 'e', 'f']
Timings:
I used the framework from this answer to measure the performance:
import random
import numpy as np
def func1(mylist, idx):
idx = set(idx)
in_idx, not_in_idx = [], []
for i, e in enumerate(mylist):
(not_in_idx, in_idx)[i in idx].append(e)
return in_idx, not_in_idx
def partition_on_index(it, indices):
indices = set(indices) # convert to set for fast lookups
l1, l2 = [], []
l_append = (l1.append, l2.append)
for idx, element in enumerate(it):
l_append[idx in indices](element)
return l1, l2
def func2(mylist, idx):
x = np.asarray(mylist)
mask = np.ones(len(mylist), dtype=bool)
mask[idx] = False
return x[mask], x[~mask]
# Timing setup
timings = {func1: [], partition_on_index: [], func2: []}
sizes = [2**i for i in range(1, 20, 2)]
# Timing
for size in sizes:
mylist = list(range(size))
indices = list({random.randint(0, size-1) for _ in range(size//2)})
for func in timings:
res = %timeit -o func(mylist, indices)
timings[func].append(res)

So for small lists the function partition_on_index
performs best. But if the input contains several thousand items (or more) ou might get faster results using the NumPy approach from Dmitri Chubarov. However all approaches perform asymptotically equally and the performance only differs by a factor 2-5.