1

I am trying to manage memory usage using 'gc.collect()' to solve the 'CUDA out of memory' error while executing my code.

However, even after using the 'del' and 'gc.collect()' functions, the allocated memory does not decrease because the memory is not freed when observing through 'nvidia-smi' in the terminal.

Here's My Code below.

# This is the loss calculation code for training the model
loss = self.train_on_batch(batch) / self.exp_dict["tasks_per_batch"]

And the internal code of the 'train_on_batch' function is as follows.

    def train_on_batch(self, batch):
        episode = batch[0]
        nclasses = episode["nclasses"]
        support_size = episode["support_size"]
        query_size = episode["query_size"]
        labels = episode["targets"].view(support_size + query_size, nclasses, -1).cuda(non_blocking=True).long()
        k = (support_size + query_size)
        c = episode["channels"]
        h = episode["height"]
        w = episode["width"]

        tx = episode["support_set"].view(support_size, nclasses, c, h, w).cuda(non_blocking=True)
        vx = episode["query_set"].view(query_size, nclasses, c, h, w).cuda(non_blocking=True)
        x = torch.cat([tx, vx], 0)
        x = x.view(-1, c, h, w).cuda(non_blocking=True)
        if self.ngpu > 1:
            embeddings = self.parallel_model(x, is_support=True)
        else:
            embeddings = self.model(x, is_support=True)
        b, c = embeddings.size()

        logits = self.get_logits(embeddings, support_size, query_size, nclasses)
        
        loss = 0
        if self.exp_dict["classification_weight"] > 0:
            loss += F.cross_entropy(self.model.classifier(embeddings.view(b, c)), labels.view(-1)) * self.exp_dict["classification_weight"]

        query_labels = torch.arange(nclasses, device=logits.device).view(1, nclasses).repeat(query_size, 1).view(-1)
        loss += F.cross_entropy(logits, query_labels) * self.exp_dict["few_shot_weight"]
        del tx
        del vx
        del x
        del logits
        del embeddings
        print(gc.get_count())
        gc.collect()
        print(gc.get_count())
        return loss

The variable allocated to the most memory is 'embeddings' (About 4GB), but it seems that it is still allocated after gc.collect().

Other variables that use the embeddings variable also applied 'del' to think that the reference count did not become 0 and thus freed, but it still causes the same problem.

Thanks for letting me know what the problem is

  • 1
    According to https://stackoverflow.com/questions/54374935/how-to-fix-this-strange-error-runtimeerror-cuda-error-out-of-memory, reducing batch size might help, among other things, like using no_grad or checking for background processes. – Dennis Apr 25 '22 at 05:15

0 Answers0