3

I am doing image classification with PyTorch. I have a separate Images folder and train and test csv file with images ids and labels . I don’t have any an idea about how to combine those images and ID and converting into tensors.

  1. train.csv : contains all ID of Image like 4325.jpg, 2345.jpg,…so on and contains Labels like cat,dog.
  2. Image_data : contains all the images of with ID name.
desertnaut
  • 57,590
  • 26
  • 140
  • 166
rts
  • 31
  • 1
  • 3

1 Answers1

8

You can create custom dataset class by inherting pytorch's torch.utils.data.Dataset.

The assumption for the following custom dataset class is

  • csv file format is

filename label
4325.jpg cat
2345.jpg dog
  • All images are inside images folder.
class CustomDataset(torch.utils.data.Dataset):
    def __init__(self, csv_path, images_folder, transform = None):
        self.df = pd.read_csv(csv_path)
        self.images_folder = images_folder
        self.transform = transform
        self.class2index = {"cat":0, "dog":1}

    def __len__(self):
        return len(self.df)
    def __getitem__(self, index):
        filename = self.df[index, "FILENAME"]
        label = self.class2index[self.df[index, "LABEL"]]
        image = PIL.Image.open(os.path.join(self.images_folder, filename))
        if self.transform is not None:
            image = self.transform(image)
        return image, label
        

Now you can use this class to load the training and test dataset using both csv file and image folder.


train_dataset = CustomDataset("path - to - train.csv", "path - to - images - folder"  )
test_dataset = CustomDataset("path - to - test.csv", "path - to - images - folder"  )


image, label = train_dataset[0]
Mitiku
  • 5,337
  • 3
  • 18
  • 35