1

I’m trying to swap resNet blocks with resNext blocks in my current model. All worked and I even trained the model for 1000+ epochs with the resNet blocks but when I added the following class to the model, it returned this error. (ran without errors in my local CPU but got the error when running in colab)

Added Class :

class GroupConv1D(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size, padding, stride, groups):
    super(GroupConv1D, self).__init__()

    if not in_channels % groups == 0:
        raise ValueError("The input channels must be divisible by the no. of groups")
    if not out_channels % groups == 0:
        raise ValueError("The output channels must be divisible by the no. of groups")

    self.kernel_size = kernel_size
    self.stride = stride
    self.padding = padding
    self.groups = groups

    self.group_in_num = in_channels // groups
    self.group_out_num = out_channels // groups
    self.conv_list = []

    for i in range(self.groups):
        self.conv_list.append(
            nn.Conv1d(
                in_channels=self.group_out_num,
                out_channels=self.group_out_num,
                kernel_size=kernel_size,
                stride=stride,
                padding=padding)
        )

def forward(self, inputs):
    feature_map_list = []
    for i in range(self.groups):
        x_i = self.conv_list[i](
            inputs[:, i * self.group_in_num: (i + 1) * self.group_in_num]
        )
        feature_map_list.append(x_i)

    out = torch.concat(feature_map_list, dim=1)
    return out

The Error :

Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
  "__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
  exec(code, run_globals)
File "/content/drive/MyDrive/FYPprototypeTest2/train.py", line 268, in <module>
  cycleGAN.trainModel()
File "/content/drive/MyDrive/FYPprototypeTest2/train.py", line 140, in trainModel
  B_fake = self.A_generator_B(A_real, A_mask)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in 
_call_impl
  return forward_call(*input, **kwargs)
File "/content/drive/MyDrive/FYPprototypeTest2/model.py", line 235, in forward
  resnet_block_1 = self.resnet_block_1(conv2d_conv1d)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in 
_call_impl
  return forward_call(*input, **kwargs)
File "/content/drive/MyDrive/FYPprototypeTest2/model.py", line 88, in forward
  group_layer = self.groupConv_1(layer_one_GLU)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in 
_call_impl
  return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 141, in 
forward
  input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in 
_call_impl
  return forward_call(*input, **kwargs)
File "/content/drive/MyDrive/FYPprototypeTest2/model.py", line 46, in forward
  inputs[:, i * self.group_in_num: (i + 1) * self.group_in_num]
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in 
_call_impl
  return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 301, in forward
  return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 298, in 
_conv_forward
  self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.FloatTensor) should 
be the same

Help would be hugely appreciated.

talonmies
  • 70,661
  • 34
  • 192
  • 269
pasindu
  • 35
  • 4
  • You're casting your inputs to the GPU but your model isn't working on your CPU. Make sure you run `model.to(device)` prior to your training or inference, where `device` is your cuda GPU ordinal. – jhso Jan 24 '22 at 05:54
  • Thanks for the reply but Im actually running all my models using the relevant device (e.g - self.A_generator_B = Generator().to(self.device) ) and this is why everything works fine with my CPU. Also, everything worked fine with GPU as well before I added the above class. I strongly believe that the issue is within that class itself. I think there's something in that class that has the default device as the CPU but I couldn't spot the exact line. – pasindu Jan 24 '22 at 06:17
  • Please edit your question with relevant code, demonstrating how you create your model and insert the above class. – jhso Jan 24 '22 at 06:55
  • You'll be able to find and recreate the error by running the model here : [link](https://github.com/Pasinduekanayake/testingModel) – pasindu Jan 24 '22 at 07:16

1 Answers1

2

Your problem in your new class GroupConv1D is that you store all your convolution modules in a regular python list self.conv_list instead of using nn Containers.
All methods that affect nn.Modules (e.g., .to(device), .eval(), etc.) are applied recursively to all relevant members of the "root" nn.Module.
However, how can pytorch tell which are the relevant members?
For this you have containers: they group together sub-modules, registers and parameters such that pytorch can recursively apply all relevant nn.Module's methods to them.

See, e.g., this answer.

Shai
  • 111,146
  • 38
  • 238
  • 371