-1

My question is very similar to this SO post: How to embed Sequence of Sentences in RNN?

Using this code snippet as an example (for 2 sequences with each timestep observation containing 3 numerical features):

import numpy as np

np.random.seed(1)
num_seq = 2
num_features = 3
MAX_KNOWN_RESPONSE_VALUE = 120

lengths = np.random.randint(low = 30, high=30000, size = num_seq)
# lengths = array([29763,   265])

X_batch = list(map(lambda len: np.random.rand(len, num_features), lengths))
# X_batch[0].shape = (29763, 3)
# X_batch[1].shape = (265, 3)

y_batch = MAX_KNOWN_RESPONSE_VALUE * np.random.rand(2)
# y_batch = array([35.51784086, 96.78678551])

The only differences (compared to the linked StackOverflow post) are:

  1. The sequences are variable length ranging from 30 to 30000 timestep observations. In the above example containing 2 sequences, the first sequence has length 29763, and the second sequence has length 265.
  2. Each timestep observation per sequence contains a set of numerical features that are not sequentially defined. In the above example, each timestep observation has 3 numerical features without any associated ordering.
  3. The response value per sequence is a numerical response and not categorical. MAX_KNOWN_RESPONSE_VALUE is the observed maximum response value in the training set, but during inference, the model should theoretically predict any nonnegative value (no negative response values are allowed, but predictions above MAX_KNOWN_RESPONSE_VALUE are permitted)

Some initial thoughts about tweaking the proposed StackOverflow solution to the previously asked question:

  • Regarding (1), in order for the RNN model to appropriately learn while minimizing the number of required zero-paddings, I am thinking that I would need to define BySequenceLengthSampler and pass that into the sampler parameter of my DataLoader object. I think it is also worthwhile for me to consider looking at packing my padded sequences, but I'm not entirely sure if this is specifically useful for my problem.

  • Regarding (2), I think there are multiple options here. My initial thoughts are inspired by this solution, but instead, I'd remove the embedding step and starting off with a Linear layer (instead of the LSTM layer identified in the solution) to translate from (batch_size, max_seq_len, num_features) to (batch_size, max_seq_len, hidden_size) which then will go through an RNN cell, resulting in (batch_size, max_seq_len, hidden_size_2). I can also directly pass (batch_size, max_seq_len, num_features) through an RNN-cell to get (batch_size, max_seq_len, hidden_size).

  • Regarding (3), instead of using the torch.sigmoid function which translates the output to a value between (0, 1) (clearly more suited for classification problems), I would either attempt to add a ReLU nonlinearity after the last Linear layer or add no nonlinearities and let the Linear layer make its prediction. I may need to transform my response value so that the weights don't explode for very large response values but I'm not sure if that's needed. However, I should definitely standardize my input values so that all features are mean 0 and standard deviation 1.

Am I approaching this problem correctly with the thoughts I had?

Could you provide me with a reproducible code solution to model this, alongside with some detailed explanation?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Wilson
  • 253
  • 2
  • 9

0 Answers0