0

I have the following code portion:

dataset = trainDataset()
train_loader = DataLoader(dataset,batch_size=1,shuffle=True)

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

images = []
image_labels = []

for i, data in enumerate(train_loader,0):
    inputs, labels = data
    inputs, labels = inputs.to(device), labels.to(device)
    inputs, labels = inputs.float(), labels.float()
    images.append(inputs)
    image_labels.append(labels)

image = images[7]
image = image.numpy()
image = image.reshape(416,416,3)
img = Image.fromarray(image,'RGB')
img.show()

The issue is that the image doesn't display properly. For instance, the dataset I have contains images of cats and dogs. But, the image displayed looks as shown below. Why is that?

enter image description here

EDIT 1

So, after @flawr's nice explanation, I have the following:

image = images[7]
image = image[0,...].permute([1,2,0])
image = image.numpy()
img = Image.fromarray(image,'RGB')
img.show()

And, the image looks as shown below. Not sure if it is a Numpy thing or the way the image is represented and displayed? I would like to also kindly note that I get a different display of the image at every run, but it is pretty much something close to the image displayed below.

enter image description here

EDIT 2

I think the issue now is with how to represent the image. By referring to this solution, I now get the following:

image = images[7]
image = image[0,...].permute([1,2,0])
image = image.numpy()
image = (image * 255).astype(np.uint8)
img = Image.fromarray(image,'RGB')
img.show()

Which produces the following image as expected :-)

enter image description here

Simplicity
  • 47,404
  • 98
  • 256
  • 385

1 Answers1

2

In pytorch you usually represent pictures with tensors of shape

(channels, height, width)

You then seem to reshape it to what you expect would be

(height, width, channels)

Note that these tensors or arrays are actually stored as 1d "array", and the multiple dimensions just come from defining strides (check out How to understand numpy strides for layman?).

In your particular case this means that consecutive values (that were basically values of the same color channela and the same row) are now interpreted as different colour channels.

So let's say you have a 2x2 image with 3 color channels. Let's say it is a chessboard pattern. In pytorch that would looks something like the following array of shape (3, 2, 2):

[[[1,0],[0,1]],[[1,0],[0,1]],[[1,0],[0,1]]]

The underlaying internal array is just

[  1,0 , 0,1  ,  1,0 , 0,1  ,  1,0 , 0,1  ]

So reshaping to (2, 2, 3) would look like so:

[[[1,0,0],[1,1,0]],[[0,1,1],[0,0,1]]]

which immediately shows how the image will be completely jumbled. Reshaping really just means setting the brackets in different places!

So what you probably want instead of reshape is permute([1, 2, 0]), (or in numpy called transpose) which will actually rearrange the data.

flawr
  • 10,814
  • 3
  • 41
  • 71
  • Thanks for your detailed answer. When I tried the solution, I got `RuntimeError: number of dims don't match in permute`. Why is that? – Simplicity Jul 17 '21 at 23:24
  • So `permute([1, 2, 0])` only works for tensors with three dimensions (that is, the `shape` has length three) - what shape does `image` have before you try to reshape/permute it? – flawr Jul 17 '21 at 23:28
  • Yes, that is what I thought about. This is the shape I got for the image: `torch.Size([1, 3, 416, 416])`. The first number is supposed to be the batch number. – Simplicity Jul 17 '21 at 23:29
  • Ah in that case you just need to get rid of that "dummy" batch dimension, you can just do `image[0, ...].permute([1,2,0])` – flawr Jul 17 '21 at 23:30
  • Thanks so much. Please see what I get now under **EDIT** in my question. – Simplicity Jul 17 '21 at 23:33