Let us take a look at the simple class:
class Temp1(nn.Module):
def __init__(self, stateSize, actionSize, layers=[10, 5], activations=[F.tanh, F.tanh] ):
super(Temp1, self).__init__()
self.layer1 = nn.Linear(stateSize, layers[0])
self.layer2 = nn.Linear(layers[0], layers[1])
self.fcFinal = nn.Linear( layers[1], actionSize )
return
This is a fairly straight forward pytorch module. It creates a simple sequential dense network. If we check its hidden parameters, we see the following:
t1 = Temp1(2, 2)
list(t1.parameters())
This is the expected result ...
[Parameter containing:
tensor([[-0.0311, -0.5513],
[-0.0634, -0.3783],
[-0.2514, 0.6139],
[ 0.4711, -0.0241],
[-0.1739, 0.2208],
[-0.1533, 0.3838],
[-0.6490, -0.5784],
[ 0.5312, 0.6703],
[ 0.3506, 0.3652],
[ 0.1768, -0.4158]], requires_grad=True), Parameter containing:
tensor([-0.3199, -0.4154, -0.5530, -0.6738, -0.4411, 0.2641, -0.3576, 0.0447,
0.0254, 0.0965], requires_grad=True), Parameter containing:
tensor([[-2.8257e-01, 6.7583e-02, 9.0356e-02, 1.0868e-01, 4.0876e-02,
4.0616e-02, 4.4419e-02, -8.1544e-02, 2.5244e-01, 3.8777e-03],
[-8.0950e-03, -1.4175e-01, -2.9492e-01, 3.1439e-01, -2.3065e-01,
-6.6631e-02, 3.0047e-01, 2.8353e-01, 2.3457e-01, -3.1399e-03],
[-5.2522e-02, -2.2183e-01, -1.5485e-01, 2.6317e-01, 2.8273e-01,
-7.4823e-02, -5.3704e-02, 9.3526e-02, -1.7916e-01, -3.1132e-04],
[ 8.9063e-02, 2.9263e-01, -1.0052e-01, 8.7005e-02, -1.1246e-01,
-2.7968e-01, 4.1411e-02, -1.6776e-01, 1.2363e-01, -2.2808e-01],
[ 2.9244e-02, 5.8296e-02, -2.9729e-01, -3.1437e-01, -9.3182e-02,
-7.5236e-03, 5.6159e-02, -2.2075e-02, 1.0337e-01, 8.1123e-02]],
requires_grad=True), Parameter containing:
tensor([ 0.2240, 0.0997, -0.0047, -0.1784, -0.0369], requires_grad=True), Parameter containing:
tensor([[ 0.3546, -0.2180, 0.1723, -0.0463, 0.2572],
[-0.1669, -0.1364, -0.0398, 0.2233, -0.1805]], requires_grad=True), Parameter containing:
tensor([ 0.0871, -0.1698], requires_grad=True)]
Now, let us try to generalize this a bit:
class Temp(nn.Module):
def __init__(self, stateSize, actionSize, layers=[10, 5], activations=[F.tanh, F.tanh] ):
super(Temp, self).__init__()
# Generate the fullly connected layer functions
self.fcLayers = []
oldN = stateSize
for i, layer in enumerate(layers):
self.fcLayers.append( nn.Linear(oldN, layer) )
oldN = layer
self.fcFinal = nn.Linear( oldN, actionSize )
return
It turns out that the number of parameters within this module is no longer the same ...
t = Temp(2, 3)
list(t.parameters())
[Parameter containing:
tensor([[-0.3342, 0.4111, 0.0418, 0.4457, 0.0648],
[ 0.4364, -0.0360, -0.2239, 0.4025, 0.1661],
[ 0.1932, -0.0896, 0.3269, -0.2179, 0.1035]], requires_grad=True),
Parameter containing:
tensor([-0.2867, -0.1354, -0.0026], requires_grad=True)]
I believe understand why this is happening. The bigger question is, how do we overcome this problem? The second, generalized method for example will not be sent to the GPU properly, and will not be trained by an optimizer.