My teammates and I are trying to code up an implementation of gradient descent and I think we're pretty close
We've (attempted) to follow the steps associated with the first answer to this question, namely:
1. Calculate the hypothesis h = X * theta
2. Calculate the loss = h - y and maybe the squared cost (loss^2)/2m
3. Calculate the gradient = X' * loss / m
4. Update the parameters theta = theta - alpha * gradient
But as you can see from the missing code there, we're at a bit of a loss as to how to calculate the gradient, have we set it up correctly?
How to execute that calculation?
What is the difference between X' and X?
double loss, cost, hypothesis;
int p, iteration;
iteration = 0;
do
{
iteration++;
cost = 0.0;
//loop through all instances (complete one epoch)
for (p = 0; p < number_of_files__train; p++)
{
hypothesis = calculateHypothesis( weights, feature_matrix__train, p, globo_dict_size );
loss = outputs__train[p] - hypothesis;
for (int i = 0; i < globo_dict_size; i++)
{
weights[i] += LEARNING_RATE * loss * feature_matrix__train[p][i] * calculateGradent( weights, i, number_of_files__train, loss );
}
//summation of squared error (error value for all instances)
cost += (loss*loss);
}
cost = cost/(2 * number_of_files__train);
}
while(cost != 0 && iteration<=MAX_ITER);
}
static double calculateHypothesis( double weights[], double[][] feature_matrix, int file_index, int globo_dict_size )
{
//# m denotes the number of examples here, not the number of features
double sum = 0.0;
for (int i = 0; i < globo_dict_size; i++)
{
sum += ( weights[i] * feature_matrix[file_index][i] );
}
//bias
sum += weights[ globo_dict_size ];
return sigmoid(sum);
}
private static double sigmoid(double x)
{
return 1 / (1 + Math.exp(-x));
}
static double calculateGradent( double weights[], int i, int number_of_files__train, double loss )
{
return weights[i] * loss / number_of_files__train;
}