10

I've downloaded some sample images from the MNIST dataset in .jpg format. Now I'm loading those images for testing my pre-trained model.

# transforms to apply to the data
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])

# MNIST dataset
test_dataset = dataset.ImageFolder(root=DATA_PATH, transform=trans)

# Data loader
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

Here DATA_PATH contains a subfolder with the sample image.

Here's my network definition

# Convolutional neural network (two convolutional layers)
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.network2D = nn.Sequential(
           nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2),
           nn.ReLU(),
           nn.MaxPool2d(kernel_size=2, stride=2),
           nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2),
           nn.ReLU(),
           nn.MaxPool2d(kernel_size=2, stride=2))
        self.network1D = nn.Sequential(
           nn.Dropout(),
           nn.Linear(7 * 7 * 64, 1000),
           nn.Linear(1000, 10))

    def forward(self, x):
        out = self.network2D(x)
        out = out.reshape(out.size(0), -1)
        out = self.network1D(out)
        return out

And this is my inference part

# Test the model
model = torch.load("mnist_weights_5.pth.tar")
model.eval()

for images, labels in test_loader:
   outputs = model(images.cuda())

When I run this code, I get the following error:

RuntimeError: Given groups=1, weight of size [32, 1, 5, 5], expected input[1, 3, 28, 28] to have 1 channels, but got 3 channels instead

I understand that the images are getting loaded as 3 channels (RGB). So how do I convert them to single channel in the dataloader?

Update: I changed transforms to include Grayscale option

trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)), transforms.Grayscale(num_output_channels=1)])

But now I get this error

TypeError: img should be PIL Image. Got <class 'torch.Tensor'>
Harsh Wardhan
  • 2,110
  • 10
  • 36
  • 51

3 Answers3

12

When using ImageFolder class and with no custom loader, pytorch uses PIL to load image and converts it to RGB. Default Loader if torchvision image backend is PIL:

def pil_loader(path):
    with open(path, 'rb') as f:
        img = Image.open(f)
        return img.convert('RGB')

You can use torchvision's Grayscale function in transforms. It will convert the 3 channel RGB image into 1 channel grayscale. Find out more about this at here

A sample code is below,

import torchvision as tv
import numpy as np
import torch.utils.data as data
dataDir         = 'D:\\general\\ML_DL\\datasets\\CIFAR'
trainTransform  = tv.transforms.Compose([tv.transforms.Grayscale(num_output_channels=1),
                                    tv.transforms.ToTensor(), 
                                    tv.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainSet        = tv.datasets.CIFAR10(dataDir, train=True, download=False, transform=trainTransform)
dataloader      = data.DataLoader(trainSet, batch_size=1, shuffle=False, num_workers=0)
images, labels  = iter(dataloader).next()
print (images.size())
yakhyo
  • 1,540
  • 1
  • 6
  • 22
Thiagarajan
  • 153
  • 1
  • 10
  • 2
    I get the following error with your code: `RuntimeError: output with shape [1, 32, 32] doesn't match the broadcast shape [3, 32, 32]` – ma3oun Oct 31 '19 at 14:34
  • this error was resolved implementing this answer https://stackoverflow.com/questions/55124407/output-and-broadcast-shape-mismatch-in-mnist-torchvision – salRad Feb 04 '20 at 18:03
  • 1
    i dont think this converts the image to grayscale because your either using Red or Green or Blue sections of the image and none are grayscale – Manjit Ullal Feb 23 '21 at 17:03
1

You may implement Dataloader not from ImageFolder, but from Datagenerator, directly load images in __getitem__ function. PIL.Image.open("..") then grayscale, to numpy and to Tensor.

Another option is to calculate greyscale(Y) channel from RGB by formula Y = 0.299 R + 0.587 G + 0.114 B. Slice array and convert to one channel.

But how do you train your model? usually train and test data loads in same way.

Sergei Chicherin
  • 2,031
  • 1
  • 18
  • 24
-2

I found an extremely simple solution to this problem. The required dimensions of the tensor are [1,1,28,28] whereas the input tensor is of the form [1,3,28,28]. So I need to read just 1 channel from it

images = images[:,0,:,:]

This gives me a tensor of the form [1,28,28]. Now I need to convert this to a tensor of the form [1,1,28,28]. Which can be done like this

images = images.unsqueeze(0)

So putting the above two lines together, the prediction part of the code can be written like this

for images, labels in test_loader:
   images = images[:,0,:,:].unsqueeze(0) ## Extract single channel and reshape the tensor
   outputs = model(images.cuda())
Harsh Wardhan
  • 2,110
  • 10
  • 36
  • 51
  • 9
    the only point that you just take information from one red channel, which is not precise. – Sergei Chicherin Oct 02 '18 at 08:07
  • 1
    No, this might actually work if R=G=B. In that case there is no reason to convert, you can just slice one channel. I am however, worried about how pytorch converts the grayscale image to RGB, which may not be simply copy grayscale channel to R and G and B. The default pytorch loader is: `def pil_loader(path): with open(path, 'rb') as f: img = Image.open(f) return img.convert('RGB')` – saurabheights Jun 29 '19 at 16:15