2

How to resume training from the saved snapshot in chainer.I was trying to implement DCGAN using chainer using the following github link:

https://github.com/chainer/chainer/blob/master/examples/dcgan/train_dcgan.py

When I try to give the --resume parameter it is showing shape mismatch error in the network.

In the python code there is option to give snaptshot from which we need to resume the training.These snapshots are automatically getting saved to result folder.That is also given as argument in the code.So I tried to resume training from the saved snapshot by giving the below command.

$ python train.py --resume 'snapshot.npz'

where train.py is the modified code with label for dcgan ciphar10 dataset.

The error I got by giving the above command is :

chainer.utils.type_check.InvalidType: 
Invalid operation is performed in: LinearFunction (Forward)

Expect: x.shape[1] == W.shape[1]
Actual: 110 != 100

When I run the python file with the below command there is no error:

$ python train.py

Complete error trace:

Exception in main training loop: 
Invalid operation is performed in: LinearFunction (Forward)
Expect: x.shape[1] == W.shape[1]
Actual: 110 != 100
Traceback (most recent call last):
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/training/trainer.py, line 315, in run
    update()
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py, line 165, in update
    self.update_core()
  File /home/964769/Lakshmi/DCGAN/updater_with_label.py, line 50, in update_core
    x_fake = gen(z,labels)
  File /home/964769/Lakshmi/DCGAN/net_with_label.py, line 61, in call
    h = F.reshape(F.relu(self.bn0(self.l0(F.concat((z,t),axis=1)))),
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/link.py, line 242, in call
    out = forward(args, *kwargs)
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/links/connection/linear.py, line 138, in forward
    return linear.linear(x, self.W, self.b, n_batch_axes=n_batch_axes)
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/functions/connection/linear.py, line 288, in linear
    y, = LinearFunction().apply(args)
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/function_node.py, line 245, in apply
    self.check_data_type_forward(in_data)
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/function_node.py, line 330, in check_data_type_forward
    self.check_type_forward(in_type)
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/functions/connection/linear.py, line 27, in check_type_forward
    x_type.shape[1] == w_type.shape[1],
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/utils/typecheck.py, line 546, in expect
    expr.expect()
  File/home/964769/anaconda3/lib/python3.6/site-packages/chainer/utils/typecheck.py, line 483, in expect
    '{0} {1} {2}'.format(left, self.inv, right))
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  Filetrain.py, line 140, in 
    main()
  Filetrain.py, line 135, in main
    trainer.run()
  File/home/964769/anaconda3/lib/python3.6/site-packages/chainer/training/trainer.py, line 329, in run
    six.reraise(sys.exc_info())
  File /home/964769/anaconda3/lib/python3.6/site-packages/six.py, line 686, in reraise
    raise value
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/training/trainer.py, line 315, in run
    update()
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py, line 165, in update
    self.update_core()
  File /home/964769/Lakshmi/DCGAN/updater_with_label.py, line 50, in update_core
    x_fake = gen(z,labels)
  File /home/964769/Lakshmi/DCGAN/net_with_label.py, line 61, in call
    h = F.reshape(F.relu(self.bn0(self.l0(F.concat((z,t),axis=1)))),
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/link.py, line 242, in call
    out = forward(args, **kwargs)
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/links/connection/linear.py, line 138, in forward
    return linear.linear(x, self.W, self.b, n_batch_axes=n_batch_axes)
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/functions/connection/linear.py, line 288, in linear
    y, = LinearFunction().apply(args)
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/function_node.py, line 245, in apply
    self.check_data_type_forward(in_data)
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/function_node.py, line 330, in check_data_type_forward
    self.check_type_forward(in_type)
  File /home/964769/anaconda3/lib/python3.6/site-packages/chainer/functions/connection/linear.py, line 27, in checktype_forward
    x_type.shape[1] == w_type.shape[1],
  File "/home/964769/anaconda3/lib/python3.6/site-packages/chainer/utils/typecheck.py, line 546, in expect
    expr.expect()
  File/home/964769/anaconda3/lib/python3.6/site-packages/chainer/utils/type_check.py", line 483, in expect
    '{0} {1} {2}'.format(left, self.inv, right))
chainer.utils.type_check.InvalidType: 
Invalid operation is performed in: LinearFunction (Forward)
Expect: x.shape[1] == W.shape[1]
Actual: 110 != 100
Yuki Hashimoto
  • 1,013
  • 7
  • 19
Lakshmi - Intel
  • 581
  • 3
  • 10
  • Did you specify the `snapshot` filepath? https://github.com/chainer/chainer/blob/master/examples/dcgan/train_dcgan.py#L94 Can you share the command & error message? – corochann Feb 27 '19 at 05:51
  • I was trying to implement ciphar10 dataset with label. Command used to run the python file is: python train.py --resume 'snapshot.npz' Error is : chainer.utils.type_check.InvalidType: Invalid operation is performed in: LinearFunction (Forward) Expect: x.shape[1] == W.shape[1] Actual: 110 != 100 – Lakshmi - Intel Feb 27 '19 at 06:25
  • 1
    where did you get the snapshot? I general, can you please come up with a detailed problem description? Move your command and output from comments to the question body. Add details necessary to reproduce the problem. – Artem Trunov Feb 27 '19 at 08:44
  • Are you using `snapshot.npz` which is trained in different configuration? What will happen if you try running `python train.py` for modified script and get new `snapshot.npz`, followed by `python train.py --resume snapshot.npz`? – corochann Feb 27 '19 at 09:57
  • No..I am using the snapshot of the same configuration to resume training. – Lakshmi - Intel Feb 27 '19 at 10:50
  • When I give the snapshot file without .npz extension ,it will show no such file or directory – Lakshmi - Intel Feb 27 '19 at 13:25
  • just to make sure does `snapshot.npz` exists in the same folder as `train.py`? – evgeni fotia Feb 27 '19 at 16:12
  • Can you show the whole error messages, which may start from `Traceback (most recent call last):`? – Yuki Hashimoto Feb 28 '19 at 01:25
  • when I give print(chainer.serializers.load_npz(args.resume, trainer)) it will print None even if I give a snapshot to resume..The snapshot which I gave is available in result folder and I gave the full path to snapshot. There is no issue with snapshot that got saved. – Lakshmi - Intel Feb 28 '19 at 05:36
  • It got resolved when I updated the numpy version...Thank you – Lakshmi - Intel Feb 28 '19 at 08:18

0 Answers0