3

As stated in the question, I need to attach a tensor to a particular point in the computation graph in Pytorch.

What I'm trying to do is this: while geting outputs from all mini-batches, accumulate them in a list and when one epoch finishes, calculate the mean. Then, I need to calculate loss according to the mean, therefore backpropagation must consider all these operations.

I am able to do that when the training data is not much (without detaching and storing). However, this is not possible when it gets bigger. If I don't detach output tensors each time, I'm running out of GPU memories and if I detach, I lose the track of output tensors from the computation graph. Looks like this is not possible no matter how many GPUs I have since PyTorch does only use first 4 for storing output tensors if I don't detach before saving them into a list even if I assign more than 4 GPUs.

Any help is really appreciated.

Thanks.

Aka
  • 137
  • 1
  • 12
  • how many mini batches in an epoch? your problem looks a little like RNN's backpropagation through time, I look forward to an answer.... – Shihab Shahriar Khan Mar 16 '19 at 17:03
  • Mini-batch size indeed does not matter since I'm trying to backprob per epoch. It'll only change the time till I run out of memory since I need to pass all training data and store their outputs. I tried many options between 16 to 128. – Aka Mar 16 '19 at 19:40
  • I actually asked for no of minibatch per epoch, not size. This value directly affects memory. – Shihab Shahriar Khan Mar 17 '19 at 05:21
  • #minibatch per epoch changes depending on the batch size, but it's not important since I need to wait until the end of one epoch each time, that means all data need to be passed throught the net either in 20 minibathces or 89 minibatches. Therefore #minibatch per epoch doesn't matter. – Aka Mar 18 '19 at 08:52
  • if you store output of a net, pytorch will have to retain the whole net. So if you have 10 outputs from 10 minibatch, pytorch will store 10 versions of the net. I'm not absolutely certain, but I'm pretty sure #minibatch is what matters, not the size of dataset – Shihab Shahriar Khan Mar 18 '19 at 11:21
  • Well, if Pytorch retain the whole net rather than just the output scores, in that case yes it matters. I don't have a goof knowledge of the internal issues of PyTorch. Forgive my ignorance. – Aka Mar 18 '19 at 11:26

0 Answers0