1

Occasionally I am in a scenario where I want to take every nth element from a list A and put it in list B, and all other elements into list C. Creating list B is basic python slicing. Is there an elegant way to create list C?

For example:

A = [0, 1, 2, 3, 4, 5, 6]
B = A[::3] # B = [0, 3, 6]
C = ???    # C = [1, 2, 4, 5]

The best I can come up with is this:

C = [x for x in A if x not in B]

But it seems silly to check membership for every element when we know mathematically which should be included. Especially because the scenario I am curious about tends to be make train/val/test splits in machine learning, where the lists can be very long. I am also open to elegant numpy solutions, but am curious if one exists in pure python as well.

jr15
  • 415
  • 6
  • 9
  • FWIW, if you don't need it to be deterministic, why not shuffling your A list, and then doing a more simple split ? Something like `B, C = A[:training_size], A[training_size:]`. That's how I used to create my train/dev/test sets :) – NiziL Oct 08 '21 at 06:51
  • `C = list(set(A).difference(B))` and then you got `C` – Albo Oct 08 '21 at 06:52
  • 1
    Otherwise, you should find what you need [on this question](https://stackoverflow.com/questions/11434599/remove-list-from-list-in-python). (Watch out with the `set` way of doing it, only works if the list does not contain duplicate and the ordering does not matter) – NiziL Oct 08 '21 at 06:53
  • `C = [d for i, d in enumerate(A) if i % 3]` – Olvin Roght Oct 08 '21 at 08:07

4 Answers4

3

Pure Python

As you already stated constructing list B is easy. List C could be constructed with compress and cycle from itertools:

C = list(compress(A, cycle([0, 1, 1])))

By the way this works one order of magnitude faster than simple loop:

from itertools import compress, cycle
from timeit import timeit


def simple_loop(a):
    result = []
    for pos in range(len(a)):
        if pos % 3:
            result.append(a[pos])
    return result


def one_liner(a):
    return compress(a, cycle([0, 1, 1]))


print('timeit simple_loop:')
print(timeit('simple_loop(list(range(1000)))', number=1000, globals=globals()))
print('timeit one_liner:')
print(timeit('one_liner(list(range(1000)))', number=1000, globals=globals()))

# Output:
# timeit simple_loop:
# 0.1398815
# timeit one_liner:
# 0.013991699999999996

Numpy

Again constructing list B is easy and constructing list C involves a mask:

import numpy as np

A = np.arange(10)
B = A[::3]
mask = np.ones(A.size, dtype=bool)
mask[::3] = 0
C = A[mask]
Kirill
  • 3,364
  • 2
  • 21
  • 36
1

Here is one numpy solution. Idea is to create an indexing array then use numpy.logical_not.

Verbose version :

>>> N = 10
>>> A = np.array(range(N))
>>> iB = np.zeros(A.shape, dtype=bool)
>>> iB[::3] = 1
>>> iB
array([ True, False, False,  True, False, False,  True, False, False,
        True])
>>> B = A[iB]
>>> B
array([0, 3, 6, 9])
>>> iC = np.logical_not(iB)
>>> C = A[iC]
>>> C
array([1, 2, 4, 5, 7, 8])

Short version using the syntactic sugar suggested by Albo:

idx = np.zeros(A.shape, dtype=bool)
idx[::3] = 1
B = A[idx]
C = A[~idx]
NiziL
  • 5,068
  • 23
  • 33
0

Not a numpy based solution - just pure python

from typing import List,Any,Tuple

def slicer(lst:List[Any],n:int)  -> Tuple[List[Any],List[Any]]:
  a = []
  b = []
  for i,x in enumerate(lst):
    if i % n == 0:
      a.append(x)
    else:
      b.append(x)
  return a,b

the_list = [0, 1, 2, 3, 4, 5, 6]
x,y = slicer(the_list,3)
print(x)
print(y)

output

[0, 3, 6]
[1, 2, 4, 5]
balderman
  • 22,927
  • 7
  • 34
  • 52
0

A couple numpy possibilities:

A = np.array(A)

A[np.resize(np.arange(3).astype(bool),A.size)]
# array([1, 2, 4, 5])
A.repeat(np.resize(np.arange(3).clip(None,1),A.size))
# array([1, 2, 4, 5])
np.delete(A,np.arange(0,A.size,3))
# array([1, 2, 4, 5])
A[np.arange(A.size)%3!=0]
# array([1, 2, 4, 5])
loopy walt
  • 868
  • 2
  • 6