1

I'm trying to generate N data points for three random variables that are jointly normal in python. If I use the following code:

import numpy as np
import scipy
import pandas
import sys
from scipy.linalg import block_diag
from pandas import *
N=100
Sigma=np.identity(3)
Mu=np.zeros((3,1))
Z=np.random.multivariate_normal(Mu, Sigma, N)

I got the following error message:

in <module>
    Z=np.random.multivariate_normal(Mu, Sigma, N)
  File "mtrand.pyx", line 4067, in numpy.random.mtrand.RandomState.multivariate_normal
ValueError: mean must be 1 dimensional

This means that the dimension of np.zeros((3,1)) is not 1. After changing the line Mu=np.zeros((3,1)) to Mu=np.zeros(3), it works. This implies that np.zeros(3) is 1 dimensional.

As np.zeros(3) and np.zeros((3,1)) are both an array of three zeros, I guess naturally both should be 1 dimensional. Using Mu.ndim in each case, I found that the dimension of np.zeros(3) is one and the dimension of np.zeros((3,1)) is two. My question is:

Why does Python make a distinction between np.zeros((3,1)) and np.zeros(3) regarding their dimensions (why is this distinction useful)?

ExcitedSnail
  • 191
  • 8
  • 1
    Is a vector with 3 scalars and a 3x1 matrix a different thing? That's the difference between 1d and 2d. – CJR Dec 28 '21 at 14:00
  • @CJR Thanks! In some other languages, such as MATLAB, a vector with 3 scalars and a 3x1 matrix are the same thing. In linear algebra, they are also the same thing. Are they treated as different things in computer science? – ExcitedSnail Dec 28 '21 at 14:13
  • Linear algebra considers a vector in 3-space the same as an overdetermined system of equations in 1-space? – CJR Dec 28 '21 at 14:20
  • @CJR Those two things are of course different in linear algebra. What I was saying is that in linear algebra, we don't distinguish between a vector with 3 coordinates and a 3x1 matrix. For example, $[0 0 0]^\top$ in linear algebra is both a 3-dimensional vector and a 3x1 matrix. – ExcitedSnail Dec 28 '21 at 15:20
  • 2
    In MATLAB everything is 2d. Size of a 'scalar' is (1,1). In `numpy` any dimension 0-32 is possible. `np.zeros((1,1,3,1)` is a 4d with 3 elements. Python has scalars and lists. Lists are 1d, though they may be nested. – hpaulj Dec 28 '21 at 15:54
  • @hpaulj Thanks! This is very clear. – ExcitedSnail Dec 29 '21 at 14:44

1 Answers1

4

It's normal for them to have different dimensions. The first one only has 1 array made of 3 zeros and the second one has 3 arrays each one made of 1 zero.

If you print Mu[0] in your example, you will get a list [0.] while if you print Mu[0] after using np.zeros(3) to define it, you will get 0.0

I can think of cases where this is distinction is useful especially when working with features in machine learning. If I have a sequence of features of size 1, I would want to use a dimension [n,1] and not [n] because that helps the model (let's say LSTM) make a difference between the sequence size and the feature size.

Wazaki
  • 899
  • 1
  • 8
  • 22
  • 1
    YES. See e.g. https://stackoverflow.com/a/38321679/2640045 – Lukas S Dec 28 '21 at 13:57
  • 1
    An array with multiple dimensions is not multiple arrays. It's still a single memory contiguous C array on the backend. – CJR Dec 28 '21 at 13:58
  • @CJR Thanks for the clarification. You're right because numpy allocates a continuous array in the memory. I just used that to explain the difference in an easier way. – Wazaki Dec 28 '21 at 14:40