I have a training set of data and both the input (data['qvec']) and output (data['tvec']) are normalized 300 dimensional vectors. I want to train the linear transform, theta- a 300x300 matrix, to minimize the cost function:
from scipy.spatial.distance import cosine
def cost_function(data, theta):
dists = [cosine(data.iloc[i]['qvec'].dot(theta), data.iloc[i]['tvec']) for i in data.index]
return sum(dists)/len(data)
I am assuming that there will be an update function that is similar to multi-variable gradient descent. That is:
def update_theta(data, theta, alpha):
for m in range(300):
for n in range(300):
cost = [(data.iloc[i]['qvec'].dot(theta) - data.iloc[i]['tvec']) * ????
for i in data.index]
theta[m,n] = theta[m,n] - alpha/len(data) * sum(cost)
return theta
I know that when theta is a 300x1 matrix, ???? is data.iloc[i]['qvec'][m], but what would it be for a 300x300 matrix? If my approach is way off, or if there is already a package for this, I'd also appreciate if anyone points me in the right direction.