I'm still figuring out LSTMs and trying to come up with the optimal and appropriate training routine and data shape.
A time series represents musical notes. Let's call it a song. So I have data in the following form. The series consists of notes that are one-hot encoded. So they have shape (timesteps, features)
. A copy of this series is made twelve times by transposing (moving up notes of) the series. One song would then take shape (12, timesteps, features)
. Each of these twelve series should be trained on independently. In addition there are multiple songs that vary in length.
I'd like to train an LSTM such that a prediction is made at every step of a series. So training data of one of the twelve series would be X = series[:-1, :], Y = series[1:, :]
and similarly for all twelve versions.
# Example data, numbers not one-hot encoded for brevity
series = [1, 3, 2, 4, 7, 7, 10]
X = [1, 3, 2, 4, 7, 7]
Y = [3, 2, 4, 7, 7, 10] # Shifted 1 step back
The twelve variations would create a natural batch, as the length does not vary. But my question to you is: can the training be arranged such that these variants are fed to the network as a batch of twelve, but the training is performed as many-to many? (one time step per one prediction)
Currently I have what seems to be a naïve approach for one single example. It feeds the time steps to the network one by one, preserving state in between:
# X = (12 * timesteps, 1, features), Y = (12 * timesteps, features)
model = Sequential()
model.add(LSTM(256, input_shape=(None, X.shape[-1]), batch_size=1, stateful=True))
model.add(Dense(Y.shape[-1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['categorical_accuracy'])
for epoch in range(10):
model.fit(X, Y, epochs=1, batch_size=1, shuffle=False)
model.reset_states()
How might the mentioned training regime be achieved for a single song of twelve variations?