np.where() solution explanation

Question

I am going through the exercises here: https://www.machinelearningplus.com/python/101-pandas-exercises-python/

Problem #16 has a solution (#1) using np.where() that I am having trouble understanding.

import pandas as pd
import numpy as np


print('pandas: {}'.format(pd.__version__))
print('NumPy: {}'.format(np.__version__))
print('-----')

ser1 = pd.Series([10, 9, 6, 5, 3, 1, 12, 8, 13])
ser2 = pd.Series([1, 3, 10, 13])

# Get the positions of items of 'ser2' in 'ser1' as a list.

# Solution 1
list1 = [np.where(i == ser1)[0].tolist()[0] for i in ser2]
print(list1)
print()

# Solution 2
list2 = [pd.Index(ser1).get_loc(i) for i in ser2]
print(list2)

I have looked up np.where() here:

# https://stackoverflow.com/questions/34667282/numpy-where-detailed-step-by-step-explanation-examples
# https://thispointer.com/numpy-where-tutorial-examples-python/
# https://www.geeksforgeeks.org/numpy-where-in-python/

To be precise, I am not understanding the function and placement of both bracketed zero's ( [0] ).

My last sentence. I do not understand the function and placement of the bracketed zero's. I don't understand what they are doing, what they represent in the function. — MarkS, May 23 '19 at 12:21
That's not a question. That's a statement. What don't you understand about it? Some examples of answerable questions: "What does [0] do to a list?", "Why do I have to take the 0th element of np.where?", "What does np.where do?", "What does tolist() do?", etc — Matt Messersmith, May 23 '19 at 12:26

TheLaurens · Accepted Answer · 2019-05-23T14:15:07.167

0

np.where outputs a tuple (output of numpy.where(condition) is not an array, but a tuple of arrays: why?), so you'd have to index it (hence the first [0]), then, the output is a numpy array of elements. There is only one in this case, so the second [0] works. the tolist() is completely redundant though

It'd be better to extend list1 with the found indexes, because this code fails when an element occurs more than once:

list1 = []
[list1.extend(np.where(i == ser1)[0]) for i in ser2]
print(list1)
print()

Not the best code imo.

tip, just check the output of stuff yourself, and you would have figured this out. just run np.where(i==ser1) and you'd have seen it returns a tuple, and you need to index it. etc.

edited May 23 '19 at 14:15

answered May 23 '19 at 12:29

TheLaurens

405
1
6
14

You wrote: "It'd be better to extend list1 with the found indexes, because this code fails when an element occurs more than once." Could you possibly point me to a code example of this. I would like to see code that won't break if an element occurs more than once. Also, I tried: [np.where(i == ser1)[0][0] for i in ser2], removing the unneeded to_list() as you suggested and it works just fine. – MarkS May 23 '19 at 13:16
Edited the answer to include the extend – TheLaurens May 23 '19 at 14:15

np.where() solution explanation

1 Answers1