1

I have learned the axis indication of numpy array from how is axis indexed in numpy's array

The article says that, for 2-D array, axis=0 stands for each col in array, and axis=1 for each row in array. It works when I use np.mean that means values by col, but np.delete in axis=0 is different that deletes elements by row.

import numpy as np

arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
'''
array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12]])
'''
np.mean(arr, 0)
'''
array([5., 6., 7., 8.])
'''
np.delete(arr,1,axis=0)
'''
array([[ 1,  2,  3,  4],
       [ 9, 10, 11, 12]])
'''

I confuse whether I'm wrong for understanding that? Why np.mean and np.delete operate in different axis when axis=0 is declared?

Wu Shiauthie
  • 69
  • 1
  • 9

3 Answers3

3

The accepted answer to the question you linked to actually says correctly that

Axis 0 is thus the first dimension (the "rows"), and axis 1 is the second dimension (the "columns")

which is what the code does and is the opposite to what you said.

This ought to be the source of your confusion. As we see from your own example:

np.delete(arr,1,axis=0)
'''
array([[ 1,  2,  3,  4],
       [ 9, 10, 11, 12]])
'''

Row at index 1 is deleted, which is exactly what we want to happen.

This is a 2D example where we have rows and columns but it is important to understand how shapes work in general and then they will make sense in higher dimension. Consider the following example:

[
  [
    [1, 2],
    [3, 4]
  ],
  [
    [5, 6],
    [7, 8],
  ],
  [
    [9, 10],
    [11, 12],
  ]
]

Here, we have 3 grids, each itself is 2x2, so we have something of shape 3x2x2. This is why we have 12 elements in total. Now, how do we know that at axis=0 we have 3 elements? Because if you look at this as a simple array and not some fancy numpy object then len(arr) == 3. Then if you take any of the elements along that axis (any of the "grids" that is), we will see that their length is 2 or len(arr[0]) == 2. That is because each of the grids has 2 rows. Finally, to check how many items each row of each of these grids has, we just have to inspect any one of these rows. Let's look at the second row of the first grid for a change. We will see that: len(arr[0][1]) == 2.

Now, what does np.mean(a, axis=0) mean? It means we will go over each of the items along axis=0 and find their mean. If these items are simply numbers (if a=np.array([1,2,3])) that's easy because the average of 1,2,3 is just the sum of these numbers divided by their quantity.

So, what if we have vectors or grids? What is the average of [2,4,6] and [0,0,0]? The convention is that the average of these to lists is a list of the averages at each index. So in other words it's:

[np.mean([2,0]), np.mean([4,0]), np.mean([6,0])]

which is trivially [1,2,3].

So, why does np.delete behave differently? Well, because the purpose of delete is to remove an element along some axis rather than to perform an aggregation over that axis. So in this particular case, we had 3 grids. So removing one of them will simply leave us with 2 grids. We could alternatively remove the second row of every grid (axis=1). That would leave us with 3 grids but each would have only 1 row instead of 2.

Hopefully, this brings some clarity :)

rudolfovic
  • 3,163
  • 2
  • 14
  • 38
1

Usually I like to think about the axis in numpy (or pandas) as an indicator of the axis "along which" computations are carried out.

In this sense when you compute the mean along axis 0, this is, along the rows, you do it for each column. But if you delete along axis 0 it means you scroll along the rows to find the index you will delete.

dmontaner
  • 2,076
  • 1
  • 14
  • 17
  • I think what @rudolfovic says is also good and can make me better understand the different functions between `np.mean` and `np.delete`. This is what he says: 'So, why does `np.delete` behave differently? Well, because the purpose of delete is to remove an element along some axis rather than to perform an aggregation over that axis.' – Wu Shiauthie Jun 24 '21 at 15:13
  • `delete` removes an element, `mean` removes (reduces) a dimension. – hpaulj Jun 24 '21 at 16:15
  • The output dimension will depend on the particular computation you are carrying out. If you do `np.cumsum` or `np.roll` for instance the output dimension will be the same as the input one. And you can also think about functions that will return arbitrary dimensions when applied along axes. You can expand to 2 dim doing: `np.apply_along_axis(lambda x: np.array([(x + 1).sum(), (x + 2).sum()]), ... ` And to 3 dims doing: `np.apply_along_axis(lambda x: np.array([(x + 1).sum(), (x + 2).sum(), (x + 3).sum()]), ... ` – dmontaner Jun 24 '21 at 17:01
  • But the `axis` parameter which is an option in all those functions will tell you _along_ which of the dimensions of the array you are working, whether the work is to reduce, or roll or drop items. You have the same meaning of the `axis` in `np.concatenate` for instance. – dmontaner Jun 24 '21 at 17:01
0

I think your confusion is possibly coming from the fact that in delete, the axis refers to the axis you are indexing along when finding the section to delete, while in mean, the axis refers to which axis you are averaging along.

In both cases, axis tells the function which axis to "move along" when trying to perform it's operation - for delete it moves down the way when searching for what delete, and for mean it moves down the way when calculating averages