I am new to PyTorch and am running into an expected error. The overall context is trying to build a building segmentation model off of Spacenet imagery. I am forked off of this repo from someone at Microsoft AI who built a segmentation model, and I am just trying to re-run her training scripts.
I've been able to download the data, and do the pre-processing. My issue comes when trying to actually train the model, I am trying to iterate through my DataLoader, and I get the following error message:
RuntimeError: Expected object of scalar type unsigned char but got scalar type float for sequence element 9.
Snippets of code that are useful:
I have a dataset.py
that creates the SpaceNetDataset
class and looks like:
import os
# Ignore warnings
import warnings
import numpy as np
from PIL import Image
import torch
from torch.utils.data import Dataset
warnings.filterwarnings('ignore')
class SpaceNetDataset(Dataset):
"""Class representing a SpaceNet dataset, such as a training set."""
def __init__(self, root_dir, splits=['trainval', 'test'], transform=None):
"""
Args:
root_dir (string): Directory containing folder annotations and .txt files with the
train/val/test splits
splits: ['trainval', 'test'] - the SpaceNet utilities code would create these two
splits while converting the labels from polygons to mask annotations. The two
splits are created after chipping larger images into the required input size with
some overlaps. Thus to have splits that do not have overlapping areas, we manually
split the images (not chips) into train/val/test using utils/split_train_val_test.py,
followed by using the SpaceNet utilities to annotate each folder, and combine the
trainval and test splits it creates inside each folder.
transform (callable, optional): Optional transform to be applied
on a sample.
"""
self.root_dir = root_dir
self.transform = transform
self.image_list = []
self.xml_list = []
data_files = []
for split in splits:
with open(os.path.join(root_dir, split + '.txt')) as f:
data_files.extend(f.read().splitlines())
for line in data_files:
line = line.split(' ')
image_name = line[0].split('/')[-1]
xml_name = line[1].split('/')[-1]
self.image_list.append(image_name)
self.xml_list.append(xml_name)
def __len__(self):
return len(self.image_list)
def __getitem__(self, idx):
img_path = os.path.join(self.root_dir, 'RGB-PanSharpen', self.image_list[idx])
target_path = os.path.join(self.root_dir, 'annotations', self.image_list[idx].replace('.tif', 'segcls.tif'))
image = np.array(Image.open(img_path))
target = np.array(Image.open(target_path))
target[target == 100] = 1 # building interior
target[target == 255] = 2 # border
sample = {'image': image, 'target': target, 'image_name': self.image_list[idx]}
if self.transform:
sample = self.transform(sample)
return sample
To create the DataLoader, I have something like:
dset_train = SpaceNetDataset(data_path_train, split_tags, transform=T.Compose([ToTensor()]))
loader_train = DataLoader(dset_train, batch_size=train_batch_size, shuffle=True,
num_workers=num_workers)
I then iterate over the data loader by doing something like:
for batch in loader_train:
image_tensors = batch['image']
images = batch['image'].cpu().numpy()
break # take the first shuffled batch
but then I get the error:
Traceback (most recent call last):
File "training/train_aml.py", line 137, in <module> sample_images_train, sample_images_train_tensors = get_sample_images(which_set='train')
File "training/train_aml.py", line 123, in get_sample_images for i, batch in enumerate(loader):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 345, in __next__ data = self._next_data() File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 856, in _next_data return self._process_data(data)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 881, in _process_data data.reraise()
File "/usr/local/lib/python3.6/dist-packages/torch/_utils.py", line 395, in reraise raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop data = fetcher.fetch(index)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch return self.collate_fn(data)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 74, in default_collate return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 74, in <dictcomp> return {key: default_collate([d[key] for d in batch]) for key in elem}
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/collate.py", line 55, in default_collate return torch.stack(batch, 0, out=out)
RuntimeError: Expected object of scalar type unsigned char but got scalar type float for sequence element 9.
The error seems quite similar to this one, although I did try a similar solution by casting:
dtype = torch.cuda.CharTensor if torch.cuda.is_available() else torch.CharTensor
for batch in loader:
batch['image'] = batch['image'].type(dtype)
batch['target'] = batch['target'].type(dtype)
but I end up with the same error.
A couple of other things that are weird:
- This seems to be non-deterministic. Most of the time I get this error, but some times the code keeps running (not sure why)
- The "Sequence Element" number at the end of the error message keeps changing. In this case it was "sequence element 9" sometimes it's "sequence element 2", etc. Not sure why.