Indexing with List of Indices to Exclude

Question

This is similar to some other questions (Explicitly select items from a Python list or tuple, Grabbing specific indices of a list in Python), but I'm looking to do the opposite:

What is a clean way to specify a list/tuple of indices to exclude, instead of to select? I'm thinking of something similar to R or MATLAB where you can specify indices to exclude, like:

vector1 <- c('a', 'b', 'c', 'd')
vector2 <- vector1[-1] # ['b', 'c', 'd']
vector3 <- vector1[c(-1, -2)] # ['c', 'd']

Is there a good way to accomplish the same thing in Python? Apologizes if this is a dupe, I wasn't sure exactly what to search for.

abarnert · Accepted Answer · 2013-08-26T19:07:06.557

>>> to_exclude = {1, 2}
>>> vector = ['a', 'b', 'c', 'd']
>>> vector2 = [element for i, element in enumerate(vector) if i not in to_exclude]

The tricks here are:

Use a list comprehension to transform one list into another. (You can also use the filter function, especially if the predicate you're filtering on is already lying around as a function with a nice name.)
Use enumerate to get each element and its index together.
Use the in operator against any Set or Sequence* type to decide which ones to filter. (A set is most efficient if there are a lot of values, and probably conceptually the right answer… But it really doesn't matter much for just a handful; if you've already got a list or tuple with 4 indices in it, that's a "Set or Sequence" too, so you can just use it.)

* Technically, any Container will do. But most Containers that aren't a Set or Sequence would be silly here.

Aha, of course. Thanks for the detailed explanation (I'll accept it when SO lets me). — Taj Morton, Aug 26 '13 at 18:51

score 8 · Answer 2 · answered Aug 26 '13 at 18:49

8

import numpy
target_list = numpy.array(['1','b','c','d','e','f','g','h','i','j'])
to_exclude = [1,4,5]
print target_list[~numpy.in1d(range(len(target_list)),to_exclude)]

because numpy is fun

answered Aug 26 '13 at 18:49

Joran Beasley

110,522
12
160
179

1

Plus, if you're translating MATLAB code to Python, you probably _should_ be looking at numpy rather than native lists and loops… – abarnert Aug 26 '13 at 19:03

score 4 · Answer 3 · answered Jun 20 '17 at 07:39

4

Use np.delete

In [38]: a
Out[38]: array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13])

In [39]: b
Out[39]: [3, 4, 5, 9]

In [40]: a[b]
Out[40]: array([ 7,  8,  9, 13])

In [41]: np.delete(a, b)
Out[41]: array([ 4,  5,  6, 10, 11, 12])

answered Jun 20 '17 at 07:39

Belter

3,573
5
42
58

score 3 · Answer 4 · answered Aug 26 '13 at 18:45

3

Use enumerate() and exclude any indices you want removed:

[elem for i, elem in enumerate(inputlist) if i not in excluded_indices]

For performance, it'd be fastest if excluded_indices was a set.

answered Aug 26 '13 at 18:45

Martijn Pieters

1,048,767
296
4,058
3,343

`set` won't actually be faster than `list` until there are more than a few elements (from a previous question, the cutoff is anywhere between 3 and 12 with strings, depending on your implementation). But conceptually it makes more sense anyway. – abarnert Aug 26 '13 at 18:49
@abarnert: Doesn't that depend on the number of elements in the input list as well? And for this filter, it could make a difference if `excluded_indices` is sorted or randomized as well; I am a little skeptical that the cutoff is every anywhere *near* 12; is the fixed cost of the set lookup (hash calculation and lookup, mainly) really that high? – Martijn Pieters Aug 26 '13 at 18:53
From what I vaguely remember, with very large `unicode` objects in Python 2.7, I found a case with a cutoff between 6 and 7… but someone else found a case that was almost twice as high, possibly in a different Python implementation. Of course notice the "with strings"; hashing ints is a lot faster, even huge ints, so I'd expect it to be around 2-3 at worst… And I'm not sure how sorting would make a difference (unless you want a third implementation using `bisect` or a tree or something). – abarnert Aug 26 '13 at 18:59
@abarnert: Hrm, you are right, sorting doesn't make a difference, the total cost of all the searches is going to be the same no matter what the order. – Martijn Pieters Aug 26 '13 at 19:00
And also, how would the number of input elements make a difference? It's going to be linear on those, except in a few edge cases (e.g., if you have lots of references to a small number of distinct slow-to-hash builtin objects, the most important factor could be the number of _unique_ elements.) – abarnert Aug 26 '13 at 19:02
Right; there is a fixed cost in a full list scan too; either that is faster or slower than the set membership test, independent of the number of elements in the input list. – Martijn Pieters Aug 26 '13 at 19:21

score 1 · Answer 5 · edited Feb 06 '21 at 23:44

1

numpy.delete(original_list,index_of_the_excluded_elements)

note that in python, index starts from 0, so for the example in the question, the codes should be:

import numpy as np
vector1=['a', 'b', 'c', 'd']
vector2 =np.delete(vector1,[0]) # ['b', 'c', 'd']
vector3 =np.delete(vector1,[0,1]) # ['c', 'd']

edited Feb 06 '21 at 23:44

Joe Ferndz

8,417
2
13
33

answered Jan 20 '20 at 10:28

Hotchinchilla

35
6

score 0 · Answer 6 · answered Jun 06 '16 at 14:44

0

I'll take a different approach, using itemgetter. Just for the fun of it :)

from operator import itemgetter

def exclude(to_exclude, vector):
    "Exclude items with particular indices from a vector."
    to_keep = set(range(len(vector))) - set(to_exclude)
    return itemgetter(*to_keep)(vector)

answered Jun 06 '16 at 14:44

Emiel

343
6
14

Although this might appear to work, sets are unordered. So, for example, `itemgetter(*{2, 3, 10})(list(range(11)))` might yield `(10, 2, 3)`. – Matthew Strawbridge Nov 23 '21 at 15:31

Indexing with List of Indices to Exclude

6 Answers6