I am working with an nlp model where I'd like to normalize the nlp.vocab.vectors
. From the documentation about spacy vectors it states that it's an numpy ndarray
.
I've googled a fair bit about normalizing numpy arrays as stated here, here and here.
As such I tried the following 3 approaches;
import spacy
import numpy as np
nlp = spacy.load('en_core_web_lg')
matrix = nlp.vocab.vectors # Shape (514157, 300)
# Approach 1
matrix_norm1 = matrix/np.linalg.norm(matrix)
print(matrix_norm1.shape) # Shape (514157,)
# Approach 2
#matrix_norm2 = matrix / np.sqrt(np.sum(matrix**2))
## Results in TypeError: unsupported operand type(s) for ** or pow(): 'spacy.vectors.Vectors' and 'int'
# Approach 3
matrix_norm3 = matrix / (np.mean(matrix) - np.std(matrix))
print(matrix_norm3.shape) # => Shape (514157,)
The two approaches that returns a result does so but it doesn't retain the dimensions (514157, 300). Any suggestions on how I can do this?