Confused about tensor dimensions and batch sizes in pytorch

Question

So I'm very new to PyTorch and Neural Networks in general, and I'm having some problems creating a Neural Network that classifies names by gender.
I based this off of the PyTorch tutorial for RNNs that classify names by nationality, but I decided not to go with a recurrent approach... Stop me right here if this was the wrong idea!
However, whenever I try to run an input through the network it tells me:

RuntimeError: matrices expected, got 3D, 2D tensors at /py/conda-bld/pytorch_1493681908901/work/torch/lib/TH/generic/THTensorMath.c:1232

I know this has something to do with how PyTorch always expects there to be a batch size or something, and I have my tensor set up that way, but you can probably tell by this point that I have no idea what I'm talking about. Here's my code:

from future import unicode_literals, print_function, division
from io import open
import glob
import unicodedata
import string
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import random
from torch.autograd import Variable
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

"""------GLOBAL VARIABLES------"""

all_letters = string.ascii_letters + " .,;'"
num_letters = len(all_letters)
all_names = {}
genders = ["Female", "Male"]

"""-------DATA EXTRACTION------"""

def findFiles(path):
    return glob.glob(path)

def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
        and c in all_letters
    )

# Read a file and split into lines
def readLines(filename):
    lines = open(filename, encoding='utf-8').read().strip().split('\n')
    return [unicodeToAscii(line) for line in lines]

for file in findFiles("/home/andrew/PyCharm/PycharmProjects/CantStop/data/names/*.txt"):
    gender = file.split("/")[-1].split(".")[0]
    names = readLines(file)
    all_names[gender] = names

"""-----DATA INTERPRETATION-----"""

def nameToTensor(name):
    tensor = torch.zeros(len(name), 1, num_letters)
    for index, letter in enumerate(name):
        tensor[index][0][all_letters.find(letter)] = 1
    return tensor

def outputToGender(output):
    gender, gender_index = output.data.topk(1)
    if gender_index[0][0] == 0:
        return "Female"
    return "Male"

"""------NETWORK SETUP------"""

class Net(nn.Module):
    def __init__(self, input_size, output_size):
        super(Net, self).__init__()
        #Layer 1
        self.Lin1 = nn.Linear(input_size, int(input_size/2))
        self.ReLu1 = nn.ReLU()
        self.Batch1 = nn.BatchNorm1d(int(input_size/2))
        #Layer 2
        self.Lin2 = nn.Linear(int(input_size/2), output_size)
        self.ReLu2 = nn.ReLU()
        self.Batch2 = nn.BatchNorm1d(output_size)
        self.softMax = nn.LogSoftmax()

    def forward(self, input):
        output1 = self.Batch1(self.ReLu1(self.Lin1(input)))
        output2 = self.softMax(self.Batch2(self.ReLu2(self.Lin2(output1))))
        return output2

NN = Net(num_letters, 2)

"""------TRAINING------"""

def getRandomTrainingEx():
    gender = genders[random.randint(0, 1)]
    name = all_names[gender][random.randint(0, len(all_names[gender])-1)]
    gender_tensor = Variable(torch.LongTensor([genders.index(gender)]))
    name_tensor = Variable(nameToTensor(name))
    return gender_tensor, name_tensor, gender

def train(input, target):
    loss_func = nn.NLLLoss()

    optimizer = optim.SGD(NN.parameters(), lr=0.0001, momentum=0.9)

    optimizer.zero_grad()

    output = NN(input)

    loss = loss_func(output, target)
    loss.backward()
    optimizer.step()

    return output, loss

all_losses = []
current_loss = 0

for i in range(100000):
    gender_tensor, name_tensor, gender = getRandomTrainingEx()
    output, loss = train(name_tensor, gender_tensor)
    current_loss += loss

    if i%1000 == 0:
        print("Guess: %s, Correct: %s, Loss: %s" % (outputToGender(output), gender, loss.data[0]))

    if i%100 == 0:
        all_losses.append(current_loss/10)
        current_loss = 0

# plt.figure()
# plt.plot(all_losses)
# plt.show()

Please help a newbie out!

Well, in my opinion, doing a Recurrent Neural Network for a task of classifying text into 2 categories seems like an overkill; a regular Neural Network should be sufficient. Recurrent NNs are better suited for things that require "short term memory", and are great applications for video and image analysis. Also, would not be surprised if "simpler" approaches, like Bayesian Classification (like Naive Bayes) could perform this tasks without problems. — DarkCygnus, Jul 10 '17 at 19:36
@GrayCygnus That's what I thought too, in that case, does my network and input/output structure seem to be set up properly? — Andrew Kirillov, Jul 10 '17 at 20:01
I also see a big issue in your possible approach... Neural Networks have a fixed input size, that is you design and train it to receive inputs (vectors) of certain shape and gives as output the classification. How are you going to obtain that feature from your data, as names have different sizes? — DarkCygnus, Jul 10 '17 at 20:16
You actually might want to use a RNN here, as they are good at processing sequences, and strings can be viewed as sequences of characters. I would consider using a bidirectional LSTM here. There's also been a bunch of work learning representations of words that has learned gender; check out `word2vec` as an example. — finbarr, Jul 10 '17 at 22:30
@GrayCygnus you're one step ahead of me. I just realized this input size issue yesterday, what it's looking like right now is that the first dimension of the input tensor, the number of letters in the name, is recognized by PyTorch as the input's batch size, which means the output and target tensors should have first dimensions of this same size... I either need to cleverly craft target tensors that will work or somehow condense the output to a 1x2 tensor... i think — Andrew Kirillov, Jul 13 '17 at 14:12
To be honest (and without the intention of killing your vibe) if I were you I would take a different approach, as NN require input sizes to adapt to varying sizes you would have to design and train *on-the-fly* every time you get an input of size N, or reuse a previous model, something quite hard to achieve (you will have to obtain train and test data for all N's.... you see my point?). There are some options you can do, I will write an answer explaining this as it is an important thing to consider. — DarkCygnus, Jul 13 '17 at 15:30

score 2 · Answer 1 · answered Jul 11 '17 at 02:36

Debugging your bug out:

Pycharm is a helpful python debugger that let you set breakpoint and views dimension of your tensor.
For easier debug, do not stack forward thing up like that

output1 = self.Batch1(self.ReLu1(self.Lin1(input)))

Instead,

h1 = self.ReLu1(self.Lin1(input))
h2 = self.Batch1(h1)

For the stacktrace, Pytorch also provide Pythonic error stacktrack. I believe that before

RuntimeError: matrices expected, got 3D, 2D tensors at /py/conda-bld/pytorch_1493681908901/work/torch/lib/TH/generic/THTensorMath.c:1232

There are some python error stacktrace that point right into your code. For easier debug, as I said, don't stack forward.

You use Pycharm to create break point before crash point. In debugger watcher Then use Variable(torch.rand(dim1, dim2)) to test out forward pass input, output dimension, and if a dimension is incorrect. Comparing with dimension of input. Call input.size() in debugger watcher.

For example, self.ReLu1(self.Lin1(Variable(torch.rand(10, 20)))).size() . If it show read text (error), then the input dimension is incorrect. Else, it show the size of the output.

Read the docs

In Pytorch Docs, it specify input/output dimension. It also have a example code snip

>>> rnn = nn.RNN(10, 20, 2)
>>> input = Variable(torch.randn(5, 3, 10))
>>> h0 = Variable(torch.randn(2, 3, 20))
>>> output, hn = rnn(input, h0)

You may use the code snip in PyCharm Debugger to explore dimension of input, output of specific layer of your interest (RNN, Linear, BatchNorm1d).

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

First, regarding your error, as other answers say and also your exception, it is probably because your input parameters are not shaped correctly. You could try debugging to isolate the line that gives the error, and then edit your question with it, so we know for sure what is causing the problem and correct it (without full stack trace it is harder to know what is the problem).

Now, you are trying to implement a Neural Network that classifies names by gender, as you indicated. We can see that this task will require to somehow input a name (which have different sizes) and output a gender (a binary variable: male, female). However, Neural Networks in general are built and trained to classify inputs (vectors) of fixed size of features, like they mention in the pytorch docs:

Parameters: input_size – The number of expected features in the input x

...

Looking at the tutorial you mentioned, they do consider this situation, as in their case the input for the network is a single letter transformed to a "one-hot vector", as they indicate:

To run a step of this network we need to pass an input (in our case, the Tensor for the current letter) and a previous hidden state (which we initialize as zeros at first). We’ll get back the output (probability of each language) and a next hidden state (which we keep for the next step).

And even give an example of it (remember tensors are Variables in pytorch):

input = Variable(letterToTensor('A'))
hidden = Variable(torch.zeros(1, n_hidden))
output, next_hidden = rnn(input, hidden)

Note: That being said, there are some other things you can do to adapt your implementation to variable-sized inputs. Based on my experience and also complemented by this and this other great questions, you could:

Preprocess your data to extract new features and transform it to fixed-size inputs. This is usually the most used approach but requires experience and patience to get good features. Some techniques used are PCA (Principal Component Analysis) and LDA (Latent Dirichlet Allocation)

For example, you could extract from your data features like: the length of the name, the number of letter a's in the name (female names tend to have more a's), the number of letter e's in the name (the same but with male names maybe?), and others... so you can generate new features like [name_length, a_found, e_found, ...]. Then you could follow a regular approach with you new fixed-size vectors. Do note that those features have to be meaningful; these ones I just came up for example (although they could work).
Split your input names into fixed-sized substring (or iterate them with a sliding window), so then you can classify them with a network designed for that size and combine the outputs in an ensemble way to obtain the final classification.

Confused about tensor dimensions and batch sizes in pytorch

2 Answers2

Linked