In training loop, I load a batch of data into CPU and then transfer it to GPU:
import torch.utils as utils
train_loader = utils.data.DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=4, pin_memory=True)
for inputs, labels in train_loader:
inputs, labels = inputs.to(device), labels.to(device)
This way of loading data is very time-consuming. Any way to directly load data into GPU without transfer step ?