I am trying to implement the standard gradient descent algorithm with pytorch in order to perform dimensionality reduction (PCA) on the Indian Pines dataset. More specifically, I am trying to estimate matrix U1
that minimizes ||X-(U1 @ U1.T)@X||^2
where U1.T
denotes the transpose of U1
, @
denotes matrix multiplication, ||
denotes the Frobenius norm and X
denotes the data (reconstruction error minimization).
For starters, I have vectorized the data and the variable indian_pines
is of size torch.Size([220, 21025])
and I initilize U1 randomly with U1 = torch.rand(size=(220,150),dtype=tl.float64,requires_grad=True)
.
For the method itself, I have the following code:
n_iters = 100
learning_rate = 2e-9
for epoch in range(n_iters):
#forward
y_pred = torch.tensordot(U1 @ torch.t(U1),indian_pines,([0],[0]))
#loss
l = torch.norm(indian_pines - y_pred, 'fro')
if epoch % 10 == 0: print(f'epoch: {epoch} loss: {l}')
#gradient
l.backward()
#update
with torch.no_grad():
U1 -= learning_rate * U1.grad
U1.grad.zero_()
with (example due to randomness) output:
epoch: 0 loss: 44439840488.652824
epoch: 10 loss: 27657067086.461464
epoch: 20 loss: 17353003250.14576
epoch: 30 loss: 10980377562.427532
epoch: 40 loss: 7000015690.042022
epoch: 50 loss: 4478747227.40419
epoch: 60 loss: 2847777701.784741
epoch: 70 loss: 1757431994.7743077
epoch: 80 loss: 990962121.4576876
epoch: 90 loss: 426658102.95583844
This loss seems to be very high and it gets even worse by increasing learning_rate
. Of course decreasing it makes the loss function reduce in a much slower rate. My question is: Is there something wrong with the way I use autograd that results in such high loss? How could I improve quality? Thanks in advance.