5

Sorry if I present my problem not clearly, English is not my first language

Problem

Short description:

I want to train a model which map input x (with shape of [n_sample, timestamp, feature]) to an output y (with exact same shape). It's like mapping 2 space

Longer version:

I have 2 float ndarrays of shape [n_sample, timestamp, feature], representing MFCC feature of n_sample audio file. These 2 ndarray are 2 speakers' speech of the same corpus, which was aligned by DTW. Lets name these 2 arrays x and y. I want to train a model, which predict y[k] given x[k]. It's like mapping from space x to space y, and the output must be exact same shape as the input

What I've tried

It's time-series problem so I decide to use RNN approach. Here is my code in PyTorch (I put comment along the code. I removed the calculation of average loss for simplicity). Note that I've tried many option for learning rate, the behavior still the same

Class define

class Net(nn.Module):
    def __init__(self, in_size, hidden_size, out_size, nb_lstm_layers):
        super().__init__()
        self.in_size = in_size
        self.hidden_size = hidden_size
        self.out_size = out_size
        self.nb_lstm_layers = nb_lstm_layers

        # self.fc1 = nn.Linear()
        self.lstm = nn.LSTM(input_size=self.in_size, hidden_size=self.hidden_size, num_layers=self.nb_lstm_layers, batch_first=True, bias=True)
        # self.fc = nn.Linear(self.hidden_size, self.out_size)
        self.fc1 = nn.Linear(self.hidden_size, 128)
        self.fc2 = nn.Linear(128, 128)
        self.fc3 = nn.Linear(128, self.out_size)

    def forward(self, x, h_state):
        out, h_state = self.lstm(x, h_state)
        output_fc = []

        for frame in out:
            output_fc.append(self.fc3(torch.tanh(self.fc1(frame)))) # I added fully connected layer to each frame, to make an output with same shape as input

        return torch.stack(output_fc), h_state

    def hidden_init(self):
        if use_cuda:
            h_state = torch.stack([torch.zeros(nb_lstm_layers, batch_size, 20) for _ in range(2)]).cuda()
        else:
            h_state = torch.stack([torch.zeros(nb_lstm_layers, batch_size, 20) for _ in range(2)])

        return h_state

Training step:

net = Net(20, 20, 20, nb_lstm_layers)
optimizer = optim.Adam(net.parameters(), lr=0.0001, weight_decay=0.0001)
criterion = nn.MSELoss()

for epoch in range(nb_epoch):
    count = 0
    loss_sum = 0

    batch_x = None
    for i in (range(len(data))): 
    # data is my entire data, which contain A and B i specify above.
        temp_x = torch.tensor(data[i][0])
        temp_y = torch.tensor(data[i][1])

        for ii in range(0, data[i][0].shape[0] - nb_frame_in_batch*2 + 1): # Create batches 
            batch_x, batch_y = get_batches(temp_x, temp_y, ii, batch_size, nb_frame_in_batch)  
            # this will return 2 tensor of shape (batch_size, nb_frame_in_batch, 20), 
            # with `batch_size` is the number of sample each time I feed to the net, 
            # nb_frame_in_batch is the number of frame in each sample
            optimizer.zero_grad()

            h_state = net.hidden_init()

            prediction, h_state = net(batch_x.float(), h_state)
            loss = criterion(prediction.float(), batch_y.float())

            h_state = (h_state[0].detach(), h_state[1].detach())

            loss.backward()
            optimizer.step()

Problem is, the loss seems not to decrease but fluctuate a lot, without a clear behaviour

enter image description here

Please help me. Any suggestion will be greatly appreciated. If somebody can inspect my code and provide some comment, that would be so kind.
Thanks in advance!

enamoria
  • 896
  • 2
  • 11
  • 29
  • 1
    If your algorithm is using gradient descent, this unexpected behavior might be due to a too high learning rate. If you take too big of a step you might get away from the local minimum of the loss function instead of approaching it. In short: try to decrease your learning rate . – user2314737 Dec 07 '18 at 10:12
  • Thanks for your suggestion. I've tried many choice for lr. It's seem not lr's fault. I think the bad behavior come from the architecture. I will edit my post for a smaller learning rate, since this `0.01` is too high anyway – enamoria Dec 07 '18 at 10:15
  • Anyway, your question might be better suited for https://datascience.stackexchange.com/ or https://stats.stackexchange.com/ since it's not a programming issue. – user2314737 Dec 07 '18 at 10:21
  • I've considered it. But for my case, beside seeking suggestion for the problem in general, I also want somebody to inspect my code and tell me what I'm doing wrong. So SO suits better than DS/stat – enamoria Dec 07 '18 at 10:27

3 Answers3

3

It seems the network learning nothing from your data, hence the loss fluctuation (since weights depends on random initialization only). There are something you can try:

  • Try to normalize the data (this suggestion is quite broad, but I can't give you more details since I don't have your data, but normalize it to a specific range like [0, 1], or to a mean and std value is worth trying)
  • One very typical problem of LSTM in pytorch is its input dimension is quite different to other type of neural network. You must feed into your network a tensor with shape (seq_len, batch, input_size). You should go here, LSTM section for better details
  • One more thing: try to tune your hyperparameters. LSTM is harder to train compare to FC or CNN (to my experience).

Tell me if you have improvement. Debugging a neural network is always hard and full of potential coding mistake

Community
  • 1
  • 1
Cypherius
  • 521
  • 2
  • 14
  • THe 2nd is exactly what I've need. The shape I've feed in is incorrect. The lost is decreasing stably now.Thanks – enamoria Dec 11 '18 at 02:01
0

With most ML algorithms it is tough to diagnose without seeing the data. Based on the inconsistency of your loss results this might be an issue with your data pre-processing. Have you tried normalizing the data first? Often times with large fluctuations in results, one of your input neuron values may be skewing your loss function making it unable to find a good direction. How to normalize a NumPy array to within a certain range? This is an example for audio normalization but I would also try adjusting the learning rate as it looks high and possibly removing a hidden layer.

Patrick Maynard
  • 314
  • 3
  • 18
  • Sorry for late response. My data is audio MFCC features, and I've tried to train the network with unnorm, norm by feature dimension, norm by timestamp dimension (by using norm i mean many kind of norm, 01 -11 ...) No help. I think the problem come from the architecture – enamoria Dec 11 '18 at 01:17
0

May the problem was in the calculation of the loss. Try to sum the losses of each time-step in a sequence and then take the average over the batch. May it helps