0

The Below is data which I passed to the Data Loader,

train_path='/content/drive/MyDrive/Dataset_manual_pytorch/train'
test_path='/content/drive/MyDrive/Dataset_manual_pytorch/test'

train = torchvision.datasets.ImageFolder(train_path,transform=transformations)
test = torchvision.datasets.ImageFolder(test_path,transform=transformations)

train_loader = torch.utils.data.DataLoader(train, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test, batch_size =32, shuffle=True)

This is my Recurrent Neural Network Model,

hidden_size = 256
sequence_length = 28
num_classes = 2
num_layers = 2
input_size = 32
learning_rate = 0.001
num_epochs = 3

class RNN(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers, num_classes):
      super(RNN, self).__init__()
      self.hidden_size = hidden_size
      self.num_layers = num_layers

      self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first = True)
      self.fc = nn.Linear(hidden_size*sequence_length, num_classes)
  
  def forward(self, x):
    h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)

    #Forward Prop
    out,_ = self.rnn(x, h0)
    out = out.reshape(out.shape[0], -1)
    out = self.fc(out)
    return out

model_rnn = RNN(input_size, hidden_size, num_layers, num_classes).to(device)

When I train this model for the particular epochs and for the training data it gives me the following error;

RuntimeError: input must have 3 dimensions, got 4

The shape of data is: torch.Size([64, 3, 32, 32])

I think the error is because I am feeding the data of 4 dimensional, in which I am passing three channels (RGB) as well, to solve this issue I need to reshape; torch.Size([64, 3, 32, 32]) --> torch.Size([64, 32, 32])) But I am unable to do this.

The Training code is;

@torch.no_grad()
def Validation_phase(model, val_loader):
    model.eval()
    for data, labels in val_loader:
      out = model(data)
      val_loss = F.cross_entropy(out, labels)
      val_acc = accuracy(out, labels)

    return val_loss.detach(), val_acc

def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
    history = []
    optimizer = opt_func(model.parameters(), lr)
    for epoch in range(epochs):
        # Training Phase 
        model.train()
        train_losses = []
        train_accuracy = []  
        for data, labels in train_loader:
            #forward

            print(data.shape)
            out = model(data)
            #loss calculate
            train_loss = F.cross_entropy(out, labels)

            #Accuracy
            train_acc = accuracy(out, labels)

            train_accuracy.append(train_acc)
            train_losses.append(train_loss.item())

            #back_propagate
            train_loss.backward()
            optimizer.step()
            optimizer.zero_grad()
    
        train_accuracy = np.mean(torch.stack(train_accuracy).numpy())
        train_losses = np.mean(train_losses)
        

        #Validation phase
        val_losses, val_accuracy = Validation_phase(model, val_loader)

        print("Epoch [{}], train_loss: {:.4f}, train_accuracy: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(
            epoch, train_losses*100 , train_accuracy*100 , val_losses.item()*100, val_accuracy.item()*100))
        # history.append(result)
    # return history

fit(5, 0.001, model_rnn, train_loader, test_loader, torch.optim.Adam) 
Tariq Hussain
  • 128
  • 2
  • 10

2 Answers2

1

You can do the size conversion of torch.Size([64, 3, 32, 32]) to torch.size([64, 32, 32]) by following the bottom code:

x = torch.ones((64, 3, 32, 32))
x = x[:, 0, :, :]

#Check code: 
print(x.size())
SarthakJain
  • 1,226
  • 6
  • 11
1

You could take the average of the RGB channels to get the image representation on a single channel with torch.mean. You may need to keep the average dimension, to be able to feed your model properly: you can do so with the keepdim option:

>>> x = torch.rand(64, 3, 32, 32)
>>> z = x.mean(1, keepdim=True)
>>> z.shape
(64, 1, 32, 32)

Keep in mind, there are other ways of going about this (RGB to grayscale conversion).

Ivan
  • 34,531
  • 8
  • 55
  • 100
  • Yes, I can take mean of all 3 channels but here the accuracy of model will be affected, well I can convert all RGB image to the grayscale but this method is too computationally high as I have millions of the images. Thank you <3 – Tariq Hussain Jul 30 '21 at 09:36
  • Of course, it will affect your model, this will effectively reduce the amount of information in your data. There is indeed no magic way around it. Alternatively, a common way is to preprocess your data once, in your case compute the grayscale once and use the converted grayscaled images in your training. What do you think? – Ivan Jul 30 '21 at 09:50
  • I am converting RGB to gray scale using OpenCV and again loading to the Data Loader for keep it in batch size. – Tariq Hussain Jul 30 '21 at 13:37