0

I am training an NLP model using the code adapted form https://github.com/huggingface/transformers/blob/master/examples/run_glue.py. I'm working with docker toolbox on windows 10 with CPU only. The code works fine locally, and I built the docker image successfully. However when I tried to do "docker run $IMAGE_URI" I got the following error at the training step:

  File "xlnet/train_config.py", line 318, in <module>
    global_step, tr_loss = train(train_dataset, model, tokenizer)

  File "xlnet/train_config.py", line 214, in train
    outputs = model(**inputs) 

...

  File "/usr/local/lib/python3.7/site-packages/pytorch_transformers/modeling_xlnet.py", line 383, in rel_shift
    x = torch.index_select(x, 1, torch.arange(klen, device=x.device, dtype=torch.long))

RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 182452224 bytes. Error code 12 (Cannot allocate memory)

When I run 'docker info', it shows "CPUs: 8, Total Memory: 7.793GiB". It should be enough...

Then I tried to allocate with memory of 10GB. There is no more error message. But it just exits at the same place without continue training.

Stormblessed
  • 201
  • 4
  • 16
Suri 07
  • 1
  • 1
  • Have you tried "docker run --memory=8G $IMAGE_URI"? There's a similar question about lack of memory for docker containers [here](https://stackoverflow.com/questions/44533319/how-to-assign-more-memory-to-docker-container) – Joe Oct 21 '19 at 04:04
  • Thank you for replying. I tried this, but still got the same error. – Suri 07 Oct 21 '19 at 19:24

0 Answers0