17

This is similar to some other questions (Explicitly select items from a Python list or tuple, Grabbing specific indices of a list in Python), but I'm looking to do the opposite:

What is a clean way to specify a list/tuple of indices to exclude, instead of to select? I'm thinking of something similar to R or MATLAB where you can specify indices to exclude, like:

vector1 <- c('a', 'b', 'c', 'd')
vector2 <- vector1[-1] # ['b', 'c', 'd']
vector3 <- vector1[c(-1, -2)] # ['c', 'd']

Is there a good way to accomplish the same thing in Python? Apologizes if this is a dupe, I wasn't sure exactly what to search for.

martineau
  • 119,623
  • 25
  • 170
  • 301
Taj Morton
  • 1,588
  • 4
  • 18
  • 26

6 Answers6

22
>>> to_exclude = {1, 2}
>>> vector = ['a', 'b', 'c', 'd']
>>> vector2 = [element for i, element in enumerate(vector) if i not in to_exclude]

The tricks here are:

  • Use a list comprehension to transform one list into another. (You can also use the filter function, especially if the predicate you're filtering on is already lying around as a function with a nice name.)
  • Use enumerate to get each element and its index together.
  • Use the in operator against any Set or Sequence* type to decide which ones to filter. (A set is most efficient if there are a lot of values, and probably conceptually the right answer… But it really doesn't matter much for just a handful; if you've already got a list or tuple with 4 indices in it, that's a "Set or Sequence" too, so you can just use it.)

* Technically, any Container will do. But most Containers that aren't a Set or Sequence would be silly here.

abarnert
  • 354,177
  • 51
  • 601
  • 671
8
import numpy
target_list = numpy.array(['1','b','c','d','e','f','g','h','i','j'])
to_exclude = [1,4,5]
print target_list[~numpy.in1d(range(len(target_list)),to_exclude)]

because numpy is fun

Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • 1
    Plus, if you're translating MATLAB code to Python, you probably _should_ be looking at numpy rather than native lists and loops… – abarnert Aug 26 '13 at 19:03
4

Use np.delete

In [38]: a
Out[38]: array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13])

In [39]: b
Out[39]: [3, 4, 5, 9]

In [40]: a[b]
Out[40]: array([ 7,  8,  9, 13])

In [41]: np.delete(a, b)
Out[41]: array([ 4,  5,  6, 10, 11, 12])
Belter
  • 3,573
  • 5
  • 42
  • 58
3

Use enumerate() and exclude any indices you want removed:

[elem for i, elem in enumerate(inputlist) if i not in excluded_indices]

For performance, it'd be fastest if excluded_indices was a set.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • `set` won't actually be faster than `list` until there are more than a few elements (from a previous question, the cutoff is anywhere between 3 and 12 with strings, depending on your implementation). But conceptually it makes more sense anyway. – abarnert Aug 26 '13 at 18:49
  • @abarnert: Doesn't that depend on the number of elements in the input list as well? And for this filter, it could make a difference if `excluded_indices` is sorted or randomized as well; I am a little skeptical that the cutoff is every anywhere *near* 12; is the fixed cost of the set lookup (hash calculation and lookup, mainly) really that high? – Martijn Pieters Aug 26 '13 at 18:53
  • From what I vaguely remember, with very large `unicode` objects in Python 2.7, I found a case with a cutoff between 6 and 7… but someone else found a case that was almost twice as high, possibly in a different Python implementation. Of course notice the "with strings"; hashing ints is a lot faster, even huge ints, so I'd expect it to be around 2-3 at worst… And I'm not sure how sorting would make a difference (unless you want a third implementation using `bisect` or a tree or something). – abarnert Aug 26 '13 at 18:59
  • @abarnert: Hrm, you are right, sorting doesn't make a difference, the total cost of all the searches is going to be the same no matter what the order. – Martijn Pieters Aug 26 '13 at 19:00
  • And also, how would the number of input elements make a difference? It's going to be linear on those, except in a few edge cases (e.g., if you have lots of references to a small number of distinct slow-to-hash builtin objects, the most important factor could be the number of _unique_ elements.) – abarnert Aug 26 '13 at 19:02
  • Right; there is a fixed cost in a full list scan too; either that is faster or slower than the set membership test, independent of the number of elements in the input list. – Martijn Pieters Aug 26 '13 at 19:21
1
numpy.delete(original_list,index_of_the_excluded_elements)

note that in python, index starts from 0, so for the example in the question, the codes should be:

import numpy as np
vector1=['a', 'b', 'c', 'd']
vector2 =np.delete(vector1,[0]) # ['b', 'c', 'd']
vector3 =np.delete(vector1,[0,1]) # ['c', 'd']
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33
0

I'll take a different approach, using itemgetter. Just for the fun of it :)

from operator import itemgetter

def exclude(to_exclude, vector):
    "Exclude items with particular indices from a vector."
    to_keep = set(range(len(vector))) - set(to_exclude)
    return itemgetter(*to_keep)(vector)
Emiel
  • 343
  • 6
  • 14