0

2d array, consists of 2 axes, axis=0 which represents the rows and the axis=1 represents the columns

aa = np.random.randn(10, 2) # Here is 2d array, first axis has 10 rows and second axis has 2 columns

array([[ 0.6999521 , -0.17597954],
       [ 1.70622947, -0.85919459],
       [-0.90019284,  0.80774052],
       [-1.42953238,  0.19727917],
       [-0.03416532,  0.49584749],
       [-0.28981586, -0.77484498],
       [-1.31129122,  0.423833  ],
       [-0.43920016, -1.93541758],
       [-0.06667634,  2.09925218],
       [ 1.24633485, -0.04153847]])

why when I want to scatter the points I only consider the first column and the second column dimension from axis=1? do dimensions mean columns when plotting and at other times they mean axes? can you please explain more the reasons to do it like this? and if there are good references I could benefit myself on dimensions relating to this

plt.scatter(x[:,0], x[:,1])  # this also means dimensions or columns?

x[:,0], x[:,1] why not do x[0,:], x[:,1}
  • `x[:,0]` is a 1d array. For scatter it doesn't matter whether the 1d array is made diectly with `np.array([1,2,3])` or indiectly from the columns or rows of the 2d array. – hpaulj Dec 23 '22 at 22:52
  • `x[0,:]` has 2 elements, the 1st 'row'. `x[:,1]` has 10. You can't 'scatter' 10 against 2. `scatter` wants 2 1d arrays that match in length. – hpaulj Dec 23 '22 at 22:58

2 Answers2

0

It can be difficult to visualize this, especially in multiple dimensions.

The parameters to the [] operator represent the dimensions. Your first dimension is the rows. The first row is array[0]. Your second dimension is the columns. The entire second column is called array[:,1] -- the ":" is a numpy notation that means "take all of this dimension". array[2,1] refers to the second column in the third row.

plt.scatter expects the x coordinate values as its first parameter, and the y coordinate values as its second parameter. plt.scatter(x[:,0], x[:,1]) means "take all of column 0" and "take all of column 1", which is the way scatter wants them.

Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
  • yes you didn't answer my question, why take the columns x[:,0], x[:,1] and not take rows and columns which represent the 2 axes/dims of the 2d array? – user4556432 Dec 23 '22 at 19:52
  • Because those are the arguments that plt.scatter expects. It wants two arrays, where each element in the first array matches the corresponding element in the second array. It could have been written differently, but it wasn't. – Frank Yellin Dec 23 '22 at 20:00
  • aha, where can I learn more about this? I can't find this in linear algebra, are there any good references that explain dimensionality? because I also have problems understanding which axis to choose to do mean/sum!! – user4556432 Dec 23 '22 at 20:18
  • `numpy` arrays are more grneral than linear algebra matrices and vectors (or more abstract). Don't try to find 'meaning' else where. – hpaulj Dec 23 '22 at 22:55
  • Axis 0 is dimension 0 -- the rows in your case. Axis 1 is dimension 1 -- the columns in your case. `sum(aa,axis=1)` would produce two elements with sums of the columns. `sum(aa,axis=0)` would produce 10 elements, when the sum of the coordinates for each point. Playing with it is probably the best way to learn. – Tim Roberts Dec 23 '22 at 23:24
0

With this randn call you make a 2d array with the specified shape. The dimensions, 10 and 2, don't represent anything - that's an abstract (10,2) array. Meaning comes from how you use it.

In [50]: aa = np.random.randn(10, 2)
In [51]: aa
Out[51]: 
array([[-0.26769106,  0.09882999],
       [-1.5605514 , -1.38614473],
       [ 1.23312852,  0.86838848],
       [ 1.2603898 ,  2.19895989],
       [-1.66937976,  0.79666952],
       [-0.15596669,  1.47848784],
       [ 1.74964902,  0.39280584],
       [-1.0982447 ,  0.46888408],
       [ 0.84396231, -0.34809148],
       [-0.83489678, -1.8093045 ]])

That's a display - with rows and columns.

Rather than pass the slices directly to scatter lets assign them to variables:

In [52]: x = aa[:,0]; y = aa[:,1]; x,y
Out[52]: 
(array([-0.26769106, -1.5605514 ,  1.23312852,  1.2603898 , -1.66937976,
        -0.15596669,  1.74964902, -1.0982447 ,  0.84396231, -0.83489678]),
 array([ 0.09882999, -1.38614473,  0.86838848,  2.19895989,  0.79666952,
         1.47848784,  0.39280584,  0.46888408, -0.34809148, -1.8093045 ]))

We now have two 1d arrays with shape (10,) (that's a 1 element tuple). We can then plot them with:

In [53]: plt.scatter(x,y)

I could just as well used

x = np.arange(10); y = np.random.randn(10)

to make two 1d arrays.

The dimensions of the aa array have nothing to do with the axes of a scatter plot.

I could select a 'row' of aa, but will only get a (2,) shape array. That can't be plotted against a (10,) array:

In [53]: aa[0,:]
Out[53]: array([-0.26769106,  0.09882999])

As for meaning of dimensions in sum/mean, why not experiement?

Sum all values:

In [54]: aa.sum()
Out[54]: 2.2598841819604134

sum down the columns, resulting in one value per column:

In [55]: aa.sum(axis=0)
Out[55]: array([-0.49960074,  2.75948492])

It can help to keepdims, producing a (1,2) array:

In [56]: aa.sum(axis=0, keepdims=True)
Out[56]: array([[-0.49960074,  2.75948492]])

or a (10,1) array:

In [57]: aa.sum(axis=1, keepdims=True)
Out[57]: 
array([[-0.16886107],
       [-2.94669614],
       [ 2.101517  ],
       [ 3.45934969],
       [-0.87271024],
       [ 1.32252115],
       [ 2.14245486],
       [-0.62936062],
       [ 0.49587083],
       [-2.64420128]])

There's some ambiguity when talking about summing along rows or columns when dealing with 2d arrays. It becomes clearer when we apply sum to 1d arrays (sum the only one), or 3d.

For example, note which dimension is missing when I do:

In [58]: np.arange(24).reshape(2,3,4).sum(axis=1).shape
Out[58]: (2, 4)

or

In [59]: np.arange(24).reshape(2,3,4).sum(axis=2)
Out[59]: 
array([[ 6, 22, 38],
       [54, 70, 86]])

Again - dimensions of numpy arrays are abstract things. An array can have 0, 1, 2 or more (up to 32) dimensions. Most of linear algebra deals with 2d arrays, matrices and "vectors". You can do LA with numpy, but numpy is used for much more.

edit

You could think of your aa as 10 2-element points. Then aa[:,0] are all the x coordinates. A mean with axis=0 would be the "center of mass" of those points.

In [60]: np.mean(aa, axis=0)
Out[60]: array([-0.04996007,  0.27594849])

Mean on axis=1 may not make sense, though you could calculate the norm of the points (sqrt(x^2+y^2)), or the length of the vectors represented by the points.

In [61]: np.linalg.norm(aa, axis=1)
Out[61]: 
array([0.28535218, 2.08727523, 1.50821235, 2.53456249, 1.84973271,
       1.48669159, 1.79320052, 1.19414978, 0.91292938, 1.99264533])

For direction of these points I'd use:

np.arctan2(aa[:,0], aa[:,1])

(or maybe switch the 0 and 1).

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • thanks for the prompt response, I understand this but I want to be able to find the relation between the dimensions and axes and the operations made among them. but which type of problems suit taking the sum across the 1st dim(axis=0)/2nd(axis=1), 3rd(axis=-1) ex, calculating the distances between points, it's always calculated across the second axis, why? when should I calculate across the rows, (axis=0)? I'd really appreciate your answer – user4556432 Dec 24 '22 at 00:50
  • What are you calling points? Are you think of the `aa` array as 10 points, with each point represented as 2 values? What the "dimensions" mean is in your head, not in `numpy`. What if you made a (2,10) array, and plotted `bb[0,:]` against `bb[1,:]`? Either can represent 10 2-element points. 3 dimensions could represent something else, for example a (400,600,3) array might be a 400x600 color image. – hpaulj Dec 24 '22 at 02:01
  • if we take the scatter plot of bb[0,:] and bb[1,:] we would have plotted the points projected on the x axis – user4556432 Dec 24 '22 at 11:35
  • I think of aa as 10 points with coordinates on 2 axes (dimensions: x, y) if we are in a 2d array but what are the types of problems that require us to calculate the distances between points on the second axis and not on the first axis? when I have something like this (400,600,3) or with batches (6,400,600,3) what are the reasons that make me average across the first axis/2nd/3rd? like is it a rule that you always take the mean across the first dim (axis=0) a rule to take differences.sum(axis=1) between points across the first dim? – user4556432 Dec 24 '22 at 13:59
  • What operation makes sense for a particular array axis depends on what meaning YOU assigned to the axis. `numpy` by itself doesn't determine that. I added to my answer a `norm` taken on axis 1. `atan2` could be used to get the angle, direction, of those 10 'points'. – hpaulj Dec 24 '22 at 15:11
  • I'm aware that the option axis = ... doesn't know about the information you have in the matrix, I mean it doesn't consider that you have points in the matrix. It just makes the function operate over the designated axis if you want rows to use axis=0. If columns, axis=1 and it's a problem that I'm facing, when I'm in a situation calculating the distances/max/min I usually don't know what axis to choose in order for an operation (min,max,...) to do its job, obviously there's no rule to follow for me to know – user4556432 Dec 24 '22 at 15:30
  • and I know it depends on the problem I'm dealing with! but this lack of knowledge in which book (because I don't know the concept name for this to look for) does it lie and if you can recommend me a good book/article that helped you just to understand on which axis should I depend doing the axes calculations I'd be thankful – user4556432 Dec 24 '22 at 15:30
  • quoting you; I added to my answer a norm taken on axis 1. atan2 could be used to get the angle, direction, of those 10 'points'. then how would you know like what does this axis =1 (columns) represents solving the problem of getting the angle or the direction of the 10 points for you to choose this axis? – user4556432 Dec 24 '22 at 15:36
  • because axis=1 ==> columns ==> collapse columns so how to know which axis to collapse? – user4556432 Dec 24 '22 at 15:45
  • https://stackoverflow.com/questions/19389910/in-python-numpy-what-is-a-dimension-and-axis like this if there's more info to it!!! – user4556432 Dec 24 '22 at 15:56
  • I know that `aa` has shape (10,2), and the you/we consider this to represent 10 2-element points. If I want a separate result for each "point', 10 in total, I'll use `axis=1` for reduction operation like `norm`, or pass `aa[:,i]` arrays to the function. If I want a size 2 result, some sort of summary overall points, axis=0 most likely is the correct choice. Sometimes I have to experiment with small cases where the choice is obvious. Playing with just 2 points in a (2,2) array would be confusing. – hpaulj Dec 24 '22 at 16:14
  • Thank You, I needed this. and if you happen to know a book in the future to expand my knowledge on this I'd appreciate it – user4556432 Dec 24 '22 at 16:54
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/250634/discussion-between-user4556432-and-hpaulj). – user4556432 Dec 24 '22 at 17:06