0

I am generating BERT embeddings using GPU and using them to train a catboost model. The embeddings part is running without any issue on GPU. The problem occurs when I try to convert these tensors to numpy , almost all the RAM is consumed (approx 20gb) although the total training data size is only 2GB.

The environment I am running it is:-

  1. Google Kubernetes Engine:- 1.21.11-gke.900
  2. CUDA 11.0
  3. Pytorch 1.11.0
  4. nvidia/cuda:11.0.3-base-ubuntu20.04 docker image

Here is sample piece of code along with memory profiling :-

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   568   1834.4 MiB   1834.4 MiB           1       @staticmethod
   569                                             @profile(stream=fp)
   570                                             def prepare_train_test_df(df_final):
   571   1834.4 MiB      0.0 MiB           1           LOG.info("Prepare train test sample.....")
   572   5532.2 MiB      0.4 MiB           2           df_final["title_cat_ar"] = df_final["title_cat_embed2_1"].apply(
   573   5531.8 MiB   3697.4 MiB     1876353               lambda row: row.detach().cpu().numpy()
   574                                                 )
   575   5532.2 MiB      0.0 MiB           1           LOG.info("Converting column {} to numpy.....".format("title_cat_embed2_1"))
   576
   577   8820.9 MiB      0.0 MiB           2           df_final["desc_embed_ar_1"] = df_final["desc_embed2_1"].apply(
   578   8820.8 MiB   3288.7 MiB     1876353               lambda row: row.detach().cpu().numpy()
   579                                                 )
   580   8820.9 MiB      0.0 MiB           1           LOG.info("Converting column {} to numpy.....".format("desc_embed2_1"))
   581
   582  12109.9 MiB      1.0 MiB           2           df_final["title_cat_embed_ar_2"] = df_final["title_cat_embed2_2"].apply(
   583  12108.9 MiB   3288.0 MiB     1876353               lambda row: row.detach().cpu().numpy()
   584                                                 )
   585  12109.9 MiB      0.0 MiB           1           LOG.info("Converting column {} to numpy.....".format("title_cat_embed2_2"))
   586
   587  15397.7 MiB      0.8 MiB           2           df_final["desc_embed_ar_2"] = df_final["desc_embed2_2"].apply(
   588  15396.9 MiB   3287.0 MiB     1876353               lambda row: row.detach().cpu().numpy()
   589                         

                    )

As seen from the profiler, report almost 3GB of memory is being occupied after row.detach().cpu().numpy() operation on a pandas df column. The size of each column in dataframe is itself not that high (max is 112mb) so it seems like something else is consuming the memory.

## memory consumption by column in bytes
title_embed            60043264
title_embed2           60043264
title_cat_ar          112581120

assigning a single tensor and then measuring memory consumption. for some reason a single tensor conversion occupies 799mb of space:-

Line #    Mem usage    Increment  Occurrences   Line Contents
=============================================================
   568   1835.1 MiB   1835.1 MiB           1       @staticmethod
   569                                             @profile(stream=fp)
   570                                             def prepare_train_test_df(df_final):
   571   1835.1 MiB      0.0 MiB           1           LOG.info("Prepare train test sample.....")
   572   1835.1 MiB      0.0 MiB           1           first_row = df_final["title_cat_embed2_1"].iloc[0]
   573   2634.2 MiB    799.1 MiB           1           first_row_numpy = first_row.detach().cpu().numpy()
   574   2634.2 MiB      0.0 MiB           1           frow = df_final["title_cat_embed2_1"].iloc[1].tolist()
   575   5532.7 MiB      0.2 MiB           2           df_final["title_cat_ar"] = df_final["title_cat_embed2_1"].apply(
   576   5532.5 MiB   2898.3 MiB     1876353               lambda row: row.detach().cpu().numpy()
   577                                                 )
   578   5526.2 MiB     -6.5 MiB           1           del df_final["title_cat_embed2_1"]
   579   5526.2 MiB      0.0 MiB           1           LOG.info("mempry consumed is  {}".format(df_final.memory_usage(deep=True)))
   580   5526.2 MiB      0.0 MiB           1           LOG.info("Converting column {} to numpy.....".format("title_cat_embed2_1"))
   581                                         
   582   5526.2 MiB      0.0 MiB           1           desc_row = df_final["desc_embed2_1"].iloc[0]
   583   5526.2 MiB      0.0 MiB           1           desc_row = desc_row.detach().cpu().numpy()
   584   5526.2 MiB      0.0 MiB           1           drow = df_final["desc_embed2_1"].iloc[1].tolist()
   585   8814.8 MiB      0.2 MiB           2           df_final["desc_embed_ar_1"] = df_final["desc_embed2_1"].apply(
   586   8814.6 MiB   3288.4 MiB     1876353               lambda row: row.detach().cpu().numpy()
   587                                                 )
   588   8807.7 MiB     -7.2 MiB           1           del df_final["desc_embed2_1"]
talonmies
  • 70,661
  • 34
  • 192
  • 269
ak1234
  • 201
  • 2
  • 10
  • As far as I know, `.detach()` and `.numpy()` method returns a tensor that shares the same memory space, and `.cpu()` does not perform copying when the original tensor is already in CPU. However for some reason, it seems like they're creating a new tensor. Can you assign one tensor to temporary variable (e.g., first row of `df_final["title_cat_embed2_1"]`) and perform `.detach().cpu().numpy()` and check if the memory is more occupied? – Hayoung May 26 '22 at 08:49
  • Attaching the result of processing single tensor and measuring memory consumption. One tensor conversion is abruptly consuming 799mb of RAM. – ak1234 May 26 '22 at 09:46
  • It seems that one or more operation of `.detach(), .cpu(), .numpy` on `pandas.DataFrame` copies some data. Maybe deleting it after copying would be a solution. – Hayoung May 27 '22 at 04:03
  • Any idea what might be the underlying reason for this? I tried copying the dataframe and then deleting it but it did not release any memory. – ak1234 May 27 '22 at 06:25
  • All three operations do not use new memory space under certain conditions. Is `first_row` a torch tensor? And also I use `torch.cuda.empty_cache()` after deleting variables if they were on cuda, but have no idea when the case is of CPU. Maybe [this](https://stackoverflow.com/questions/15455048/releasing-memory-in-python) question/answer can help you to properly release CPU memory after deleting the object. – Hayoung May 27 '22 at 06:38
  • yes first_row is a torch tensor – ak1234 May 27 '22 at 06:47
  • I've tried assigning a new tensor `a` and called `a.detach().cpu().numpy()`. No memory had increased. I suspect that the tensor wrapped by `DataFrame` acts differently. – Hayoung May 27 '22 at 07:02

0 Answers0