what happens under the hood of broadcasting a numpy array

Question

I explored the standard python documentation on broadcasting https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html

Everytime I explore this topic, I couldn't find enough internal details of how broadcasting works in reality, does it have to do something with vectorization?

as per the docs, broadcasting is a memory efficient operation and it does not make actual copy in memory so then how the arithmetic computation works internally,

any source code or any sources that talk about the under the hood concept would help clarify my doubts.

The "trick" is in how the [`strides`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.strides.html) are set in the broadcasted array (see [How to understand numpy strides for layman?](https://stackoverflow.com/q/53097952)). E.g. you can use a `0` stride to repeat an array indefinitely along one dimension without using any more memory. — jdehesa, Mar 12 '19 at 15:49
@jdehesa Though I understood the concept behind strides now, thanks to the above link, but I still couldn't connect the dot with broadcasting, i think i am missing something very fundamental here, because arithmetic operation between an one-dimensional array and a scalar value results in a vectorized code by broadcasting the scalar to the same size of the array, so how does the scalar value gets strided or stretched or expanded to the same size of the array so that element wise operation between them could be possible, — TheCodeCache, Mar 12 '19 at 16:08
I don't think digging deep into source code will help you - it's hard to follow C code. But you might find it instructive to play with `np.broadcast_to` and `np.broadcast_arrays` (whose code is in `np.lib.stride_tricks`). Pay attention to how they manipulate the `strides` attributes. — hpaulj, Mar 12 '19 at 16:11
Developers have been trying to consolidate the broadcasted iteration into one powerful function, `nditer`. The compiled code calls it via the c-api, but you can play with it via `np.nditer`. I hesitate to mention it because it isn't a speed tool. But as a learning tool it may help. — hpaulj, Mar 12 '19 at 16:17
I've drilled down through the source code and have hit the the actual code snippet, however i will explore this later in my day time, https://github.com/numpy/numpy/blob/v1.13.0/numpy/lib/stride_tricks.py#L115-L132 — TheCodeCache, Mar 12 '19 at 16:29
Notice that `broadcast_to` uses `np.nditer`, which I just mentioned. — hpaulj, Mar 12 '19 at 16:39

hpaulj · Answer 1 · 2019-03-12T16:30:54.990

A simple illustration of broadcasting - 1d with a scalar:

In [18]: x = np.arange(10)                                                      
In [19]: X,Y = np.broadcast_arrays(x,3)                                         
In [20]: X                                                                      
Out[20]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [21]: Y                                                                      
Out[21]: array([3, 3, 3, 3, 3, 3, 3, 3, 3, 3])
In [22]: Y.strides                                                              
Out[22]: (0,)
In [23]: X+Y                                                                    
Out[23]: array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12])
In [24]: [i+j for i,j in zip(X,Y)]                                              
Out[24]: [3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

With this construction it's hard to prove that Y doesn't take up as much memory as x. So let's expand x instead:

In [30]: x1 = np.broadcast_to(x,(3,10))                                         
In [31]: x1                                                                     
Out[31]: 
array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
       [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])
In [32]: x.__array_interface__                                                  
Out[32]: 
{'data': (30797968, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (10,),
 'version': 3}
In [33]: x1.__array_interface__                                                 
Out[33]: 
{'data': (30797968, True),
 'strides': (0, 8),
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (3, 10),
 'version': 3}

x1 shares x databuffer. No additional memory is used (except for the array object itself).

And make the scalar into a 2d:

In [34]: x2,y2 = np.broadcast_arrays(x1,Y)                                      
In [35]: y2.strides                                                             
Out[35]: (0, 0)

thanks for some good suggestion @hpaulj however I am looking towards more clarification in the above statement "x1 shares x databuffer. No additional memory is used (except for the array object itself)." btw, one doubt: x1 is just a view over x or it has actual physical copy in memory.? — TheCodeCache, Mar 12 '19 at 16:37
That `__array_interface__` is a dictionary view of an array object, showing its essential attributes. Each array has its own shape and strides, but may share the `data` buffer with other arrays (if it's a `view`). — hpaulj, Mar 12 '19 at 16:41

what happens under the hood of broadcasting a numpy array

1 Answers1