6

I have already found the source for the numpy.ma.where() function but it seems to be calling the numpy.where() function and to better understand it I would like to take a look if possible.

typecasto
  • 136
  • 2
  • 13
usr48
  • 101
  • 2
  • 6
  • This question was asked here, Please have a look, there are some answers. https://stackoverflow.com/questions/34667282/numpy-where-detailed-step-by-step-explanation-examples – manas dash Feb 03 '19 at 02:44
  • None of the above questions are about the source code. – usr48 Feb 03 '19 at 03:23
  • 2
    line 2920 in https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/multiarraymodule.c it is wrapped to c code – NaN Feb 03 '19 at 03:39
  • 2
    As NaN points out, the function is implemented in C. The specific location of the C function is https://github.com/numpy/numpy/blob/972e10a7eb270bad3677acac0808c46d5cddea93/numpy/core/src/multiarray/multiarraymodule.c#L2916 – Warren Weckesser Feb 03 '19 at 03:52

4 Answers4

12

Most Python functions are written in the Python language, but some functions are written in something more native (often the C language).

Regular Python functions ("pure Python")

There are a few techniques you can use to ask Python itself where a function is defined. Probably the most portable uses the inspect module:

>>> import numpy
>>> import inspect
>>> inspect.isbuiltin(numpy.ma.where)
False
>>> inspect.getsourcefile(numpy.ma.where)
'.../numpy/core/multiarray.py'

But this won't work with a native ("built-in") function:

>>> import numpy
>>> import inspect
>>> inspect.isbuiltin(numpy.where)
True
>>> inspect.getsourcefile(numpy.where)
TypeError: <built-in function where> is not a module, class, method, function, traceback, frame, or code object

Native ("built-in") functions

Unfortunately, Python doesn't provide a record of source files for built-in functions. You can find out which module provides the function:

>>> import numpy as np
>>> np.where
<built-in function where>
>>> np.where.__module__
'numpy.core.multiarray'

Python won't help you find the native (C) source code for that module, but in this case it's reasonable to look in the numpy project for C source that has similar names. I found the following file:

numpy/core/src/multiarray/multiarraymodule.c

And in that file, I found a list of definitions (PyMethodDef) including:

    {"where",
        (PyCFunction)array_where,
        METH_VARARGS, NULL},

This suggests that the C function array_where is the one that Python sees as "where".

The array_where function is defined in the same file, and it mostly delegates to the PyArray_Where function.

In short

NumPy's np.where function is written in C, not Python. A good place to look is PyArray_Where.

RJHunter
  • 2,829
  • 3
  • 25
  • 30
  • 1
    Will this method work for other Python libraries like pandas as well? – usr48 Feb 03 '19 at 04:53
  • 1
    @usr48 Yes, the process for finding source code will be similar in other libraries like pandas. Pandas is mostly pure Python, so `inspect.getsourcefile` will work in most cases. – RJHunter Feb 03 '19 at 05:48
  • @RJHunter Just tried to reproduce your examples with numpy version 1.17. After executing `print(inspect.isbuiltin(numpy.where), inspect.getsourcefile(numpy.where))` Igot `False <__array_function__ internals>`. Can you explain this? – Michael S Nov 27 '19 at 16:54
  • 1
    @MichaelS NumPy introduced an extra wrapper layer of generated Python. If you search the numpy source for `<__array_function__ internals>` you can see that this fake filename comes from the wrapping layer: https://github.com/numpy/numpy/blob/v1.17.4/numpy/core/overrides.py#L174-L191 That code also gives the clue that you can inspect `numpy.where._implementation` to look through that wrapper and find the built-in function which the answer covers. (I don't plan to update the answer — this question has been closed, and I have scaled back my involvement with Stack Overflow) – RJHunter Dec 06 '19 at 01:40
3

First there are 2 distinct versions of where, one that takes just the condition, the other that takes 3 arrays.

The simpler one is most commonly used, and is just another name for np.nonzero. This scans through the condition array twice. Once with np.count_nonzero to determine how many nonzero entries there are, which allows it to allocate the return arrays. The second step is to fill in the coordinates of all nonzero entries. The key is that it returns a tuple of arrays, one array for each dimension of condition.

The condition, x, y version takes three arrays, which it broadcasts against each other. The return array has the common broadcasted shape, with elements chosen from x and y as explained in the answers to your previous question, How exactly does numpy.where() select the elements in this example?

You do realize that most of this code is c or cython, with a significant about of preprocessing. It is hard to read, even for experienced users. It is easier to run a variety of test cases and get a feel for what is happening that way.

A couple things to watch out for. np.where is a python function, and python evaluates each input fully before passing them to it. This is conditional assignment, not conditional evaluation function.

And unless you pass 3 arrays that match in shape, or scalar x and y, you'll need a good understanding of broadcasting.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • My confusion is out of the three parameters which one decides the shape of the return value. – usr48 Feb 03 '19 at 04:38
  • It's the joint broadcasting. In each of the examples of your other question, all three parameters have the same shape (once they are turned into arrays). This same broadcasting lets us add and multiply arrays. `np.broadcast_arrays` may help you explore this. – hpaulj Feb 03 '19 at 04:50
0

You can find the code in numpy.core.multiarray

Erik Z
  • 412
  • 3
  • 11
0

C:\Users\<name>\AppData\Local\Programs\Python\Python37-32\Lib\site-packages\numpy\core\multiarray.py is where I found it.

typecasto
  • 136
  • 2
  • 13