With this randn
call you make a 2d array with the specified shape. The dimensions, 10 and 2, don't represent anything - that's an abstract (10,2) array. Meaning comes from how you use it.
In [50]: aa = np.random.randn(10, 2)
In [51]: aa
Out[51]:
array([[-0.26769106, 0.09882999],
[-1.5605514 , -1.38614473],
[ 1.23312852, 0.86838848],
[ 1.2603898 , 2.19895989],
[-1.66937976, 0.79666952],
[-0.15596669, 1.47848784],
[ 1.74964902, 0.39280584],
[-1.0982447 , 0.46888408],
[ 0.84396231, -0.34809148],
[-0.83489678, -1.8093045 ]])
That's a display - with rows and columns.
Rather than pass the slices directly to scatter
lets assign them to variables:
In [52]: x = aa[:,0]; y = aa[:,1]; x,y
Out[52]:
(array([-0.26769106, -1.5605514 , 1.23312852, 1.2603898 , -1.66937976,
-0.15596669, 1.74964902, -1.0982447 , 0.84396231, -0.83489678]),
array([ 0.09882999, -1.38614473, 0.86838848, 2.19895989, 0.79666952,
1.47848784, 0.39280584, 0.46888408, -0.34809148, -1.8093045 ]))
We now have two 1d arrays with shape (10,) (that's a 1 element tuple). We can then plot them with:
In [53]: plt.scatter(x,y)
I could just as well used
x = np.arange(10); y = np.random.randn(10)
to make two 1d arrays.
The dimensions of the aa
array have nothing to do with the axes of a scatter plot.
I could select a 'row' of aa
, but will only get a (2,) shape array. That can't be plotted against a (10,) array:
In [53]: aa[0,:]
Out[53]: array([-0.26769106, 0.09882999])
As for meaning of dimensions in sum/mean
, why not experiement?
Sum all values:
In [54]: aa.sum()
Out[54]: 2.2598841819604134
sum down the columns, resulting in one value per column:
In [55]: aa.sum(axis=0)
Out[55]: array([-0.49960074, 2.75948492])
It can help to keepdims
, producing a (1,2) array:
In [56]: aa.sum(axis=0, keepdims=True)
Out[56]: array([[-0.49960074, 2.75948492]])
or a (10,1) array:
In [57]: aa.sum(axis=1, keepdims=True)
Out[57]:
array([[-0.16886107],
[-2.94669614],
[ 2.101517 ],
[ 3.45934969],
[-0.87271024],
[ 1.32252115],
[ 2.14245486],
[-0.62936062],
[ 0.49587083],
[-2.64420128]])
There's some ambiguity when talking about summing along rows or columns when dealing with 2d arrays. It becomes clearer when we apply sum
to 1d arrays (sum the only one), or 3d.
For example, note which dimension is missing when I do:
In [58]: np.arange(24).reshape(2,3,4).sum(axis=1).shape
Out[58]: (2, 4)
or
In [59]: np.arange(24).reshape(2,3,4).sum(axis=2)
Out[59]:
array([[ 6, 22, 38],
[54, 70, 86]])
Again - dimensions of numpy arrays are abstract things. An array can have 0, 1, 2 or more (up to 32) dimensions. Most of linear algebra deals with 2d arrays, matrices and "vectors". You can do LA with numpy
, but numpy
is used for much more.
edit
You could think of your aa
as 10 2-element points. Then aa[:,0]
are all the x
coordinates. A mean with axis=0 would be the "center of mass" of those points.
In [60]: np.mean(aa, axis=0)
Out[60]: array([-0.04996007, 0.27594849])
Mean on axis=1 may not make sense, though you could calculate the norm of the points (sqrt(x^2+y^2)
), or the length of the vectors represented by the points.
In [61]: np.linalg.norm(aa, axis=1)
Out[61]:
array([0.28535218, 2.08727523, 1.50821235, 2.53456249, 1.84973271,
1.48669159, 1.79320052, 1.19414978, 0.91292938, 1.99264533])
For direction of these points I'd use:
np.arctan2(aa[:,0], aa[:,1])
(or maybe switch the 0 and 1).