The accepted answer to the question you linked to actually says correctly that
Axis 0 is thus the first dimension (the "rows"), and axis 1 is the second dimension (the "columns")
which is what the code does and is the opposite to what you said.
This ought to be the source of your confusion. As we see from your own example:
np.delete(arr,1,axis=0)
'''
array([[ 1, 2, 3, 4],
[ 9, 10, 11, 12]])
'''
Row at index 1 is deleted, which is exactly what we want to happen.
This is a 2D example where we have rows and columns but it is important to understand how shapes work in general and then they will make sense in higher dimension. Consider the following example:
[
[
[1, 2],
[3, 4]
],
[
[5, 6],
[7, 8],
],
[
[9, 10],
[11, 12],
]
]
Here, we have 3 grids, each itself is 2x2, so we have something of shape 3x2x2. This is why we have 12 elements in total. Now, how do we know that at axis=0
we have 3 elements? Because if you look at this as a simple array and not some fancy numpy
object then len(arr) == 3
. Then if you take any of the elements along that axis (any of the "grids" that is), we will see that their length is 2 or len(arr[0]) == 2
. That is because each of the grids has 2 rows. Finally, to check how many items each row of each of these grids has, we just have to inspect any one of these rows. Let's look at the second row of the first grid for a change. We will see that: len(arr[0][1]) == 2
.
Now, what does np.mean(a, axis=0)
mean? It means we will go over each of the items along axis=0
and find their mean. If these items are simply numbers (if a=np.array([1,2,3])
) that's easy because the average of 1,2,3
is just the sum of these numbers divided by their quantity.
So, what if we have vectors or grids? What is the average of [2,4,6]
and [0,0,0]
? The convention is that the average of these to lists is a list of the averages at each index. So in other words it's:
[np.mean([2,0]), np.mean([4,0]), np.mean([6,0])]
which is trivially [1,2,3]
.
So, why does np.delete
behave differently? Well, because the purpose of delete is to remove an element along some axis rather than to perform an aggregation over that axis. So in this particular case, we had 3 grids. So removing one of them will simply leave us with 2 grids. We could alternatively remove the second row of every grid (axis=1
). That would leave us with 3 grids but each would have only 1 row instead of 2.
Hopefully, this brings some clarity :)