1

I want to train a model using this dataset : https://decode.mit.edu/projects/biked/ and I followed this tutorial to do such : https://github.com/jeffheaton/present/blob/master/youtube/gan/colab_gan_train.ipynb

The Problem is once I start the command to start training the dataset the tick freezing on 0 is that normal? shout I keep it? because it's taking forever, I tried to change the worker number to 2 but still the same problem, I'm using Colab Free so I don't know, I even tried to use my own dataset but all of them are the same problem

enter image description here

Training options:
{
  "num_gpus": 1,
  "image_snapshot_ticks": 10,
  "network_snapshot_ticks": 10,
  "metrics": [
    "fid50k_full"
  ],
  "random_seed": 0,
  "training_set_kwargs": {
    "class_name": "training.dataset.ImageFolderDataset",
    "path": "/content/drive/MyDrive/SquareImages.zip",
    "use_labels": false,
    "max_size": 42799,
    "xflip": false,
    "resolution": 1024
  },
  "data_loader_kwargs": {
    "pin_memory": true,
    "num_workers": 3,
    "prefetch_factor": 2
  },
  "G_kwargs": {
    "class_name": "training.networks.Generator",
    "z_dim": 512,
    "w_dim": 512,
    "mapping_kwargs": {
      "num_layers": 2
    },
    "synthesis_kwargs": {
      "channel_base": 32768,
      "channel_max": 512,
      "num_fp16_res": 4,
      "conv_clamp": 256
    }
  },
  "D_kwargs": {
    "class_name": "training.networks.Discriminator",
    "block_kwargs": {},
    "mapping_kwargs": {},
    "epilogue_kwargs": {
      "mbstd_group_size": 4
    },
    "channel_base": 32768,
    "channel_max": 512,
    "num_fp16_res": 4,
    "conv_clamp": 256
  },
  "G_opt_kwargs": {
    "class_name": "torch.optim.Adam",
    "lr": 0.002,
    "betas": [
      0,
      0.99
    ],
    "eps": 1e-08
  },
  "D_opt_kwargs": {
    "class_name": "torch.optim.Adam",
    "lr": 0.002,
    "betas": [
      0,
      0.99
    ],
    "eps": 1e-08
  },
  "loss_kwargs": {
    "class_name": "training.loss.StyleGAN2Loss",
    "r1_gamma": 52.4288
  },
  "total_kimg": 25000,
  "batch_size": 4,
  "batch_gpu": 4,
  "ema_kimg": 1.25,
  "ema_rampup": 0.05,
  "ada_target": 0.6,
  "augment_kwargs": {
    "class_name": "training.augment.AugmentPipe",
    "xflip": 1,
    "rotate90": 1,
    "xint": 1,
    "scale": 1,
    "rotate": 1,
    "aniso": 1,
    "xfrac": 1,
    "brightness": 1,
    "contrast": 1,
    "lumaflip": 1,
    "hue": 1,
    "saturation": 1
  },
  "run_dir": "/content/drive/MyDrive/exp/00003-SquareImages-auto1"
}

Output directory:   /content/drive/MyDrive/exp/00003-SquareImages-auto1
Training data:      /content/drive/MyDrive/SquareImages.zip
Training duration:  25000 kimg
Number of GPUs:     1
Number of images:   42799
Image resolution:   1024
Conditional model:  False
Dataset x-flips:    False

Creating output directory...
Launching processes...
Loading training set...
/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:474: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(

Num images:  42799
Image shape: [3, 1024, 1024]
Label shape: [0]

Constructing networks...
Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.

Generator              Parameters  Buffers  Output shape         Datatype
---                    ---         ---      ---                  ---     
mapping.fc0            262656      -        [4, 512]             float32 
mapping.fc1            262656      -        [4, 512]             float32 
mapping                -           512      [4, 18, 512]         float32 
synthesis.b4.conv1     2622465     32       [4, 512, 4, 4]       float32 
synthesis.b4.torgb     264195      -        [4, 3, 4, 4]         float32 
synthesis.b4:0         8192        16       [4, 512, 4, 4]       float32 
synthesis.b4:1         -           -        [4, 512, 4, 4]       float32 
synthesis.b8.conv0     2622465     80       [4, 512, 8, 8]       float32 
synthesis.b8.conv1     2622465     80       [4, 512, 8, 8]       float32 
synthesis.b8.torgb     264195      -        [4, 3, 8, 8]         float32 
synthesis.b8:0         -           16       [4, 512, 8, 8]       float32 
synthesis.b8:1         -           -        [4, 512, 8, 8]       float32 
synthesis.b16.conv0    2622465     272      [4, 512, 16, 16]     float32 
synthesis.b16.conv1    2622465     272      [4, 512, 16, 16]     float32 
synthesis.b16.torgb    264195      -        [4, 3, 16, 16]       float32 
synthesis.b16:0        -           16       [4, 512, 16, 16]     float32 
synthesis.b16:1        -           -        [4, 512, 16, 16]     float32 
synthesis.b32.conv0    2622465     1040     [4, 512, 32, 32]     float32 
synthesis.b32.conv1    2622465     1040     [4, 512, 32, 32]     float32 
synthesis.b32.torgb    264195      -        [4, 3, 32, 32]       float32 
synthesis.b32:0        -           16       [4, 512, 32, 32]     float32 
synthesis.b32:1        -           -        [4, 512, 32, 32]     float32 
synthesis.b64.conv0    2622465     4112     [4, 512, 64, 64]     float32 
synthesis.b64.conv1    2622465     4112     [4, 512, 64, 64]     float32 
synthesis.b64.torgb    264195      -        [4, 3, 64, 64]       float32 
synthesis.b64:0        -           16       [4, 512, 64, 64]     float32 
synthesis.b64:1        -           -        [4, 512, 64, 64]     float32 
synthesis.b128.conv0   1442561     16400    [4, 256, 128, 128]   float16 
synthesis.b128.conv1   721409      16400    [4, 256, 128, 128]   float16 
synthesis.b128.torgb   132099      -        [4, 3, 128, 128]     float16 
synthesis.b128:0       -           16       [4, 256, 128, 128]   float16 
synthesis.b128:1       -           -        [4, 256, 128, 128]   float32 
synthesis.b256.conv0   426369      65552    [4, 128, 256, 256]   float16 
synthesis.b256.conv1   213249      65552    [4, 128, 256, 256]   float16 
synthesis.b256.torgb   66051       -        [4, 3, 256, 256]     float16 
synthesis.b256:0       -           16       [4, 128, 256, 256]   float16 
synthesis.b256:1       -           -        [4, 128, 256, 256]   float32 
synthesis.b512.conv0   139457      262160   [4, 64, 512, 512]    float16 
synthesis.b512.conv1   69761       262160   [4, 64, 512, 512]    float16 
synthesis.b512.torgb   33027       -        [4, 3, 512, 512]     float16 
synthesis.b512:0       -           16       [4, 64, 512, 512]    float16 
synthesis.b512:1       -           -        [4, 64, 512, 512]    float32 
synthesis.b1024.conv0  51297       1048592  [4, 32, 1024, 1024]  float16 
synthesis.b1024.conv1  25665       1048592  [4, 32, 1024, 1024]  float16 
synthesis.b1024.torgb  16515       -        [4, 3, 1024, 1024]   float16 
synthesis.b1024:0      -           16       [4, 32, 1024, 1024]  float16 
synthesis.b1024:1      -           -        [4, 32, 1024, 1024]  float32 
---                    ---         ---      ---                  ---     
Total                  28794124    2797104  -                    -       


Discriminator  Parameters  Buffers  Output shape         Datatype
---            ---         ---      ---                  ---     
b1024.fromrgb  128         16       [4, 32, 1024, 1024]  float16 
b1024.skip     2048        16       [4, 64, 512, 512]    float16 
b1024.conv0    9248        16       [4, 32, 1024, 1024]  float16 
b1024.conv1    18496       16       [4, 64, 512, 512]    float16 
b1024          -           16       [4, 64, 512, 512]    float16 
b512.skip      8192        16       [4, 128, 256, 256]   float16 
b512.conv0     36928       16       [4, 64, 512, 512]    float16 
b512.conv1     73856       16       [4, 128, 256, 256]   float16 
b512           -           16       [4, 128, 256, 256]   float16 
b256.skip      32768       16       [4, 256, 128, 128]   float16 
b256.conv0     147584      16       [4, 128, 256, 256]   float16 
b256.conv1     295168      16       [4, 256, 128, 128]   float16 
b256           -           16       [4, 256, 128, 128]   float16 
b128.skip      131072      16       [4, 512, 64, 64]     float16 
b128.conv0     590080      16       [4, 256, 128, 128]   float16 
b128.conv1     1180160     16       [4, 512, 64, 64]     float16 
b128           -           16       [4, 512, 64, 64]     float16 
b64.skip       262144      16       [4, 512, 32, 32]     float32 
b64.conv0      2359808     16       [4, 512, 64, 64]     float32 
b64.conv1      2359808     16       [4, 512, 32, 32]     float32 
b64            -           16       [4, 512, 32, 32]     float32 
b32.skip       262144      16       [4, 512, 16, 16]     float32 
b32.conv0      2359808     16       [4, 512, 32, 32]     float32 
b32.conv1      2359808     16       [4, 512, 16, 16]     float32 
b32            -           16       [4, 512, 16, 16]     float32 
b16.skip       262144      16       [4, 512, 8, 8]       float32 
b16.conv0      2359808     16       [4, 512, 16, 16]     float32 
b16.conv1      2359808     16       [4, 512, 8, 8]       float32 
b16            -           16       [4, 512, 8, 8]       float32 
b8.skip        262144      16       [4, 512, 4, 4]       float32 
b8.conv0       2359808     16       [4, 512, 8, 8]       float32 
b8.conv1       2359808     16       [4, 512, 4, 4]       float32 
b8             -           16       [4, 512, 4, 4]       float32 
b4.mbstd       -           -        [4, 513, 4, 4]       float32 
b4.conv        2364416     16       [4, 512, 4, 4]       float32 
b4.fc          4194816     -        [4, 512]             float32 
b4.out         513         -        [4, 1]               float32 
---            ---         ---      ---                  ---     
Total          29012513    544      -                    -       

Setting up augmentation...
Distributing across 1 GPUs...
Setting up training phases...
Exporting sample images...
Initializing logs...
Training for 25000 kimg...

tick 0     kimg 0.0      time 1m 31s       sec/tick 14.4    sec/kimg 3595.75 maintenance 76.8   cpumem 4.82   gpumem 11.32  augment 0.000
Evaluating metrics...
/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:474: UserWarning: This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(

Train Model using StyleGAN2 in Google CoLab - Model Training Freezing

ii_7sn001
  • 45
  • 4
  • I was working on T4 GPU which is very slow, it turned that there is no issue it's just matter of time. Problem Solved. – ii_7sn001 Feb 05 '23 at 15:26

0 Answers0