I am trying to calculate the grad of a variable in PyTorch. However, there was a RuntimeError which tells me that the shape of output and grad must be the same. However, in my case, the shape of output and grad cannot be the same. Here is my code to reproduce:
import numpy as np
import torch
from torch.autograd import Variable as V
ne = 3
m, n = 79, 164
G = np.random.rand(m, n).astype(np.float64)
w = np.random.rand(n, n).astype(np.float64)
z = -np.random.rand(n).astype(np.float64)
G = V(torch.from_numpy(G))
w = V(torch.from_numpy(w))
z = V(torch.from_numpy(z), requires_grad=True)
e, v = torch.symeig(torch.diag(2 * z - torch.sum(w, dim=1)) + w, eigenvectors=True, upper=False)
ssev = torch.sum(torch.pow(e[-ne:] * v[:, -ne:], 2), dim=1)
out = torch.sum(torch.matmul(G, ssev.reshape((n, 1))))
out.backward(z)
print(z.grad)
The error message is: RuntimeError: Mismatch in shape: grad_output[0] has a shape of torch.Size([164]) and output[0] has a shape of torch.Size([])
Similar calculation is allowed in TensorFlow and I can successfully get the gradient I want:
import numpy as np
import tensorflow as tf
m, n = 79, 164
G = np.random.rand(m, n).astype(np.float64)
w = np.random.rand(n, n).astype(np.float64)
z = -np.random.rand(n).astype(np.float64)
def tf_function(z, G, w, ne=3):
e, v = tf.linalg.eigh(tf.linalg.diag(2 * z - tf.reduce_sum(w, 1)) + w)
ssev = tf.reduce_sum(tf.square(e[-ne:] * v[:, -ne:]), 1)
return tf.reduce_sum(tf.matmul(G, tf.expand_dims(ssev, 1)))
z, G, w = [tf.convert_to_tensor(_, dtype=tf.float64) for _ in (z, G, w)]
z = tf.Variable(z)
with tf.GradientTape() as g:
g.watch(z)
out = tf_function(z, G, w)
print(g.gradient(out, z).numpy())
My tensorflow version is 2.0 and my PyTorch version is 1.14.0. I am using Python3.6.9. In my opinion, calculating the gradients when the output and the variables have different shapes is very reasonable and I don't think I made any mistake.Can anyone help me with this problem? I really appreciate it!