2

I have a training set of data and both the input (data['qvec']) and output (data['tvec']) are normalized 300 dimensional vectors. I want to train the linear transform, theta- a 300x300 matrix, to minimize the cost function:

from scipy.spatial.distance import cosine

def cost_function(data, theta):
    dists =  [cosine(data.iloc[i]['qvec'].dot(theta), data.iloc[i]['tvec']) for i in data.index]
    return sum(dists)/len(data)

I am assuming that there will be an update function that is similar to multi-variable gradient descent. That is:

def update_theta(data, theta, alpha):
    for m in range(300):
        for n in range(300):
            cost = [(data.iloc[i]['qvec'].dot(theta) - data.iloc[i]['tvec']) * ???? 
                        for i in data.index]
            theta[m,n] = theta[m,n] - alpha/len(data) * sum(cost)
    return theta

I know that when theta is a 300x1 matrix, ???? is data.iloc[i]['qvec'][m], but what would it be for a 300x300 matrix? If my approach is way off, or if there is already a package for this, I'd also appreciate if anyone points me in the right direction.

0 Answers0