I am trying to understand why re-assigning the forward method of a pytorch model object leads to the following error under multi-gpu prediction job (configured automatically by huggingface trainer)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)
This happens when I re-assign the forward method of my model object like so
model = CustomModel(...)
partial_kwargs = {'key1': value1, ..}
model.forward = partial(model.forward, **partial_kwargs)
If instead I pass partial_kwargs
as constructor kwargs of CustomModel
, I don't get the cuda device error above.
Please let me know if anything is unclear in the description and I can add more context. This question seems related but not the same, as I did not explicitly assign specific cuda devices in any part of the code.