3

I have a 2D numpy array with a shape (3, 3) and dtype=object whose elements are tuples of the form (str, str, float).

template = ('Apple', 'Orange', 5.0)
my_array = np.array([None] * 9).reshape((3,3))

for i in range(my_array.shape[0]):
    for j in range(my_array.shape[1]):
        my_array[i, j] = template

But when I try to get a boolean mask

print(my_array == template)

The answer is all False

[[False False False]
 [False False False]
 [False False False]]

However element-wise comparison still works

print(my_array[0,0] == template) # This prints True

Why does the boolean mask return all False and how do I make it work?

P.S. I have searched for relevant topics but couldn't make use of any...

Array of tuples in Python
Restructuring Array of Tuples
Apply function to an array of tuples
Filter numpy array of tuples

Oxana Verkholyak
  • 125
  • 1
  • 10

1 Answers1

1

What is happening here is that in Python tuples are compared by position. So when you do

my_array == template

what you are actually doing (row-wise) is:

('Apple', 'Orange', 5.0) == 'Apple'
('Apple', 'Orange', 5.0) == 'Orange'
('Apple', 'Orange', 5.0) == 5.0

To verify that this is the case, try experimenting with the following example:

>>> other_array = np.array(['Apple', 'Orange', 5.0] * 3).reshape(3,3)
>>> other_array
array([['Apple', 'Orange', '5.0'],
       ['Apple', 'Orange', '5.0'],
       ['Apple', 'Orange', '5.0']], dtype='<U6')
>>> other_array == template
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

I don't know of any non-hackish way to work around this and get direct equality comparison working. If a hack suffices and your array is not too large you could try:

mask = np.array(list(map(lambda x: x == template,
                         my_array.flatten()))).reshape(my_array.shape)

or

mask = np.array([x == template for x in my_array.flatten()]).reshape(my_array.shape)

Is there a reason why you need a array of tuples? Can't you have another dimension in your array, or maybe use pandas for your categorical variables?

Daniel
  • 11,332
  • 9
  • 44
  • 72
  • The elements are tuples because they come from pandas multiIndex, and the rows actually represent the sequences of multiindex values. Later I need to use np.where to find the indices which are equal to a given template. So if there is a way to get rid of tuples by, as you suggest, adding another dimension to the array, I think it will do! But is there an easy way to do so? – Oxana Verkholyak Apr 22 '18 at 12:31
  • Thanks for explanation, very helpful! – Oxana Verkholyak Apr 22 '18 at 12:45
  • I’d suggest asking another question and giving more information on the data that you have and what exactly you want to achieve. – Daniel Apr 22 '18 at 12:46