3

This question might be silly, but i couldn't find an explanation to that.

I am coding the multivariate probability density function from scratch (for study purposes), and one of the things that i need to compute is the covariance matrix of data. I am using the Iris dataset (150 samples, 4 features), and when i code:


cov_matrix = np.cov(X)
print(cov_matrix.shape) // (150,150)

I don't understand why it is returning a 150x150 matrix, is this an "element-wise covariance matrix"? Shouldn't it be a 4x4 covariance matrix?

Thanks in advance.

tdy
  • 36,675
  • 19
  • 86
  • 83
heresthebuzz
  • 678
  • 7
  • 21

2 Answers2

2

By default Numpy assumes that variables are in rows, while observations in columns:

rowvar : bool, optional
If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

Tim
  • 7,075
  • 6
  • 29
  • 58
1

In numpy.cov's reference page, there is an argument called rowvar which is set to True by default. The following paragraph is its explanation:

If rowvar is True (default), then each row represents a variable, with observations in the columns. Otherwise, the relationship is transposed: each column represents a variable, while the rows contain observations.

So, it assumes the given matrix has observations in the columns. Hence, you either need to input $X^T$ (via X.T) or call this function with rowvar=False.

gunes
  • 288
  • 3
  • 11