I am training an NLP model using the code adapted form https://github.com/huggingface/transformers/blob/master/examples/run_glue.py. I'm working with docker toolbox on windows 10 with CPU only. The code works fine locally, and I built the docker image successfully. However when I tried to do "docker run $IMAGE_URI" I got the following error at the training step:
File "xlnet/train_config.py", line 318, in <module>
global_step, tr_loss = train(train_dataset, model, tokenizer)
File "xlnet/train_config.py", line 214, in train
outputs = model(**inputs)
...
File "/usr/local/lib/python3.7/site-packages/pytorch_transformers/modeling_xlnet.py", line 383, in rel_shift
x = torch.index_select(x, 1, torch.arange(klen, device=x.device, dtype=torch.long))
RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 182452224 bytes. Error code 12 (Cannot allocate memory)
When I run 'docker info', it shows "CPUs: 8, Total Memory: 7.793GiB". It should be enough...
Then I tried to allocate with memory of 10GB. There is no more error message. But it just exits at the same place without continue training.