0

When trying to train a ResNet I get this error. Any help as to why this happens would be appreciated. This happens when I try to iterate through the Dataloader:

File "C:\Users\JCout\AppData\Local\Temp/ipykernel_2540/2174299330.py", line 1, in <module>
    runfile('C:/Users/JCout/Documents/GitHub/Hybrid_resnet/transfer_learning.py', wdir='C:/Users/JCout/Documents/GitHub/Hybrid_resnet')

  File "C:\Users\JCout\anaconda3\lib\site-packages\debugpy\_vendored\pydevd\_pydev_bundle\pydev_umd.py", line 167, in runfile
    execfile(filename, namespace)

  File "C:\Users\JCout\anaconda3\lib\site-packages\debugpy\_vendored\pydevd\_pydev_imps\_pydev_execfile.py", line 25, in execfile
    exec(compile(contents + "\n", file, 'exec'), glob, loc)

  File "C:/Users/JCout/Documents/GitHub/Hybrid_resnet/transfer_learning.py", line 24, in <module>
    model, train_loss, test_loss = train.train_model(training, testing,

  File "C:\Users\JCout\Documents\GitHub\Hybrid_resnet\train.py", line 70, in train_model
    train_stats = train.train_step(model, criterion, optimizer, train_loader)

  File "C:\Users\JCout\Documents\GitHub\Hybrid_resnet\train.py", line 121, in train_step
    for i, (x_imgs, labels) in enumerate(train_loader):

  File "C:\Users\JCout\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 359, in __iter__
    return self._get_iterator()

  File "C:\Users\JCout\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 305, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)

  File "C:\Users\JCout\anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 918, in __init__
    w.start()

  File "C:\Users\JCout\anaconda3\lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)

  File "C:\Users\JCout\anaconda3\lib\multiprocessing\context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)

  File "C:\Users\JCout\anaconda3\lib\multiprocessing\context.py", line 327, in _Popen
    return Popen(process_obj)

  File "C:\Users\JCout\anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__
    reduction.dump(process_obj, to_child)

  File "C:\Users\JCout\anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)

OSError: [Errno 22] Invalid argument

I'm also getting this traceback after the error which I find a bit weird since I'm not using pickle at all. The data consists of 2 .tif files and 2 .mat files for the data/targets.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\JCout\anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\JCout\anaconda3\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\JCout\anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\JCout\anaconda3\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
  • 1
    Your question is missing the top few lines of the first traceback. Adding your code would also help. OSError 22 indicates a malformed file path - Is there a spot in your code where you are specifying where this pickle file should go? – 0x5453 Dec 03 '21 at 17:07
  • 1
    That's really strange. OSError 22 is an "invalid path" error. But the file it's opening is actually a pipe to the child process, so it doesn't really have a path. Perhaps you're having the same problem as the person in this thread? https://stackoverflow.com/questions/23688492/oserror-errno-22-invalid-argument-in-subprocess Which would suggest that the child process is exiting before the pickled data is written to it. – Nick ODell Dec 03 '21 at 17:09
  • @0x5453 I updated the post to include the entire traceback that I'm getting. I also added the fact that I'm not using pickle at any point within my scripts. I'm also not currently saving any data, only loading from certain directories (which work. I'm able to load the data in with no problem) – Jonathan Couture Dec 03 '21 at 17:26
  • @NickODell I can't really follow what's happening in the link you provided to be frank but it doesn't seem to be caused by the same thing. – Jonathan Couture Dec 03 '21 at 17:32
  • @JonathanCouture If you read the source code of multiprocessing/reduction.py and multiprocessing/posix_spawn_win32.py, (the top two files in your traceback) you'll see that multiprocessing is establishing a pipe between parent and child process, and the pickled data is moving through that pipe. So even if you're not using pipes directly, they're still getting used by a library you're using. – Nick ODell Dec 03 '21 at 17:43

1 Answers1

2

Not entirely sure what was happening, but I had

pin_memory = True
num_workers = 4

within the DataLoader. Removing these removed the error.