37

Example:

from __future__ import division
import numpy as np

n = 8
"""masking lists"""
lst = range(n)
print lst

# the mask (filter)
msk = [(el>3) and (el<=6) for el in lst]
print msk

# use of the mask
print [lst[i] for i in xrange(len(lst)) if msk[i]]

"""masking arrays"""
ary = np.arange(n)
print ary

# the mask (filter)
msk = (ary>3)&(ary<=6)
print msk

# use of the mask
print ary[msk]                          # very elegant  

and the results are:

>>> 
[0, 1, 2, 3, 4, 5, 6, 7]
[False, False, False, False, True, True, True, False]
[4, 5, 6]
[0 1 2 3 4 5 6 7]
[False False False False  True  True  True False]
[4 5 6]

As you see the operation of masking on array is more elegant compared to list. If you try to use the array masking scheme on list you'll get an error:

>>> lst[msk]
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
TypeError: only integer arrays with one element can be converted to an index

The question is to find an elegant masking for lists.

Updates:
The answer by jamylak was accepted for introducing compress however the points mentioned by Joel Cornett made the solution complete to a desired form of my interest.

>>> mlist = MaskableList
>>> mlist(lst)[msk]
>>> [4, 5, 6]
Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
Developer
  • 8,258
  • 8
  • 49
  • 58

6 Answers6

61

If you are using numpy:

>>> import numpy as np
>>> a = np.arange(8)
>>> mask = np.array([False, False, False, False, True, True, True, False], dtype=np.bool)
>>> a[mask]
array([4, 5, 6])

If you are not using numpy you are looking for itertools.compress

>>> from itertools import compress
>>> a = range(8)
>>> mask = [False, False, False, False, True, True, True, False]
>>> list(compress(a, mask))
[4, 5, 6]
jamylak
  • 128,818
  • 30
  • 231
  • 230
  • 2
    by far the best solution here – Derek Eden Sep 07 '21 at 02:00
  • @jamylak : from Python 3+ it seems to be `zip()`, instead of `izip()` – Pierre Oct 19 '21 at 09:12
  • 1
    @Pierre I have updated the answer now. I removed that code snippet because it may have been misleading anyway and it can be viewed through the link. Also I think since the original question used `numpy` it's important to highlight `numpy` – jamylak Oct 20 '21 at 08:06
16

If you are using Numpy, you can do it easily using Numpy array without installing any other library:

>> a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>> msk = [ True, False, False,  True,  True,  True,  True, False, False, False]
>> a = np.array(a) # convert list to numpy array
>> result = a[msk] # mask a
>> result.tolist()
[0, 3, 4, 5, 6]
biendltb
  • 1,149
  • 1
  • 13
  • 20
7

Since jamylak already answered the question with a practical answer, here is my example of a list with builtin masking support (totally unnecessary, btw):

from itertools import compress
class MaskableList(list):
    def __getitem__(self, index):
        try: return super(MaskableList, self).__getitem__(index)
        except TypeError: return MaskableList(compress(self, index))

Usage:

>>> myList = MaskableList(range(10))
>>> myList
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> mask = [0, 1, 1, 0]
>>> myList[mask]
[1, 2]

Note that compress stops when either the data or the mask runs out. If you wish to keep the portion of the list that extends past the length of the mask, you could try something like:

from itertools import izip_longest

[i[0] for i in izip_longest(myList, mask[:len(myList)], fillvalue=True) if i[1]]
Joel Cornett
  • 24,192
  • 9
  • 66
  • 88
  • +1 Thank you for addressing the same use of masking on lists as arrays by proposing MaskableList. It looks very interesting and works very well as my desire. A quick note is that these are a bit slower compared to array masking. I added your points as updates. – Developer Apr 23 '12 at 05:33
  • I tried your solution of a `MaskableList`, but I have some issues re-instantiate it. For each element in a loop I want to mask this by a new list: `for i in arange(0,n): fts = MaskableList(F) sorter = argsort(A) result[i] = zip(fts[sorter],A[sorter])` but each iteration, fts[sorter] contains the same values, whereas sorter is different each time. I normally use python rather as a script language and thus I am not that familiar with objects. – Milla Well Jan 26 '13 at 14:12
  • @Developer: I haven't tested it specifically, but one reason that `MaskableList` could be considerably slower is because of the slightly expensive exception handling that's going on. Try switching the `try...except` around, so that it attempts to mask by default. – Joel Cornett Jan 28 '13 at 13:42
  • @MillaWell: I'm not familiar with `argsort`. Also, what is `A`, and what are the contents of `F`? – Joel Cornett Jan 28 '13 at 13:43
  • @JoelCornett : argsort sorts an array and returns the list of indices of the original. `A=[3.5,2.0,1.1,4.0]`; `argsort(A)` would return `[2,1,0,3]`. `F` is just a un-maskable list, let us say ["A","B","C","D"], thus `zip(fts[sorter],A[sorter])` should output: `{"A":1.1,"B":2.0,"C":3.5,"D":4.0}` – Milla Well Feb 01 '13 at 17:48
  • @MillaWell: Ah, well your first problem is that `MaskableList` doesn't do what you think it does. It returns the result of a binary mask (1, 0, or True/False) on a list. It won't reorder the elements according to a list of indices. Secondly, `zip(fts[sorter], A[sorter])` would output a list of tuples, but you have a dict. – Joel Cornett Feb 02 '13 at 03:30
  • @MillaWell: If I have an list `myList` and a list of indices `b = argsort(A)`, I would do `newList = [myList[i] for i in b]` to achieve the desired result. – Joel Cornett Feb 02 '13 at 03:32
  • First I need to correct my example, the desired result is of course {"C":1.1,"B":2.0,"A":3.5,"D":4.0}. And Second: I actually do use a loop to achieve this, but was curious, why there is no way to have `1-d array` indexing. This would look much smarter, since I wouldn't have to use another for loop – Milla Well Feb 02 '13 at 11:32
  • @MillaWell: Hmmm... Well, technically speaking even the fastest array indexing would require you to use a loop at some level of operation (whether that's in Python or in the underlying C code is dependent on your use of `map()`, `itertools` and other optimized tools). Since this seems to be a very long comment thread, may I suggest that you post a question regarding our conversation? I'm not super familiar with `numpy` and the solution you seek may already be out there. – Joel Cornett Feb 02 '13 at 12:36
  • @JoelCornett I posted a question here: http://stackoverflow.com/questions/14664333/1d-list-indexing-python-enhance-maskablelist thanks again for your ideas. – Milla Well Feb 02 '13 at 17:15
4

i don't consider it elegant. It's compact, but tends to be confusing, as the construct is very different than most languages.

As Rossum has said about language design, we spend more time reading it than writing it. The more obscure the construction of a line of code, the more confusing it becomes to others, who may lack familiarity with Python, even though they have full competency in any number of other languages.

Readability trumps short form notations everyday in the real world of servicing code. Just like fixing your car. Big drawings with lots of information make troubleshooting a lot easier.

For me, I would much rather troubleshoot someone's code that uses the long form

print [lst[i] for i in xrange(len(lst)) if msk[i]]

than the numpy short notation mask. I don't need to have any special knowledge of a specific Python package to interpret it.

GrandMasterFlush
  • 6,269
  • 19
  • 81
  • 104
Jim
  • 41
  • 1
1

The following works perfectly well in Python 3:

np.array(lst)[msk]

If you need a list back as the result:

np.array(lst)[msk].tolist()
Jake Drew
  • 2,230
  • 23
  • 29
0

You could also just use list and zip

  1. define a funcion
def masklist(mylist,mymask):
    return [a for a,b in zip(mylist,mymask) if b]
  1. use it!
n = 8
lst = range(n)
msk = [(el>3) and (el<=6) for el in lst]
lst_msk = masklist(lst,msk)
print(lst_msk)
brodegon
  • 231
  • 2
  • 12