9

I am reading this article (The Unreasonable Effectiveness of Recurrent Neural Networks) and want to understand how to express one-to-one, one-to-many, many-to-one, and many-to-many LSTM neural networks in Keras. I have read a lot about RNN and understand how LSTM NNs work, in particular vanishing gradient, LSTM cells, their outputs and states, sequence output and etc. However, I have trouble expressing all these concepts in Keras.

To start with I have created the following toy NN using LSTM layer

from keras.models import Model
from keras.layers import Input, LSTM
import numpy as np

t1 = Input(shape=(2, 3))
t2 = LSTM(1)(t1)
model = Model(inputs=t1, outputs=t2)

inp = np.array([[[1,2,3],[4,5,6]]])
model.predict(inp)

Output:

array([[ 0.0264638]], dtype=float32)

In my example I have the input shape 2 by 3. As far as I understand this means that the input is a sequence of 2 vectors and each vector has 3 features and hence my input must be a 3D tensor of shape (n_examples, 2, 3). In terms of 'sequences', the input is a sequence of length 2, and each element in the sequence is expressed by 3 features (please correct me if I am wrong). When I call predict it returns a 2-dim tensor with a single scalar. So,

Q1: Is it one-to-one or another type of LSTM network?

When we say "one/many input and one/many output"

Q2: what do we mean by "one/many input/output"? A "one/many" scalar(s), vector(s), sequence(s)..., one/many what?

Q3: Can someone give a simple working example in Keras for each type of the networks: 1-1, 1-M, M-1, and M-M?

PS: I ask multiple questions in a single thread since they are very close and related to each other.

Milo Lu
  • 3,176
  • 3
  • 35
  • 46
fade2black
  • 546
  • 1
  • 10
  • 26
  • 1
    https://stackoverflow.com/questions/43034960/many-to-one-and-many-to-many-lstm-examples-in-keras – ralf htp Sep 02 '18 at 16:09
  • 1
    @nuric Thanks for reference, however this post does not answer my question fully. In particular, is my example one-to-one, or many-to-one and why? If output from the LSTM is a vector do we consider the output as a many (scalars) or one (vector)? – fade2black Sep 02 '18 at 16:36
  • 2
    You consider it as once vector. The terms used in many one etc refer to timesteps not features. – nuric Sep 02 '18 at 17:43

1 Answers1

11

The distinction one-to-one, one-to-many, many-to-one, many-to-many is only existent in case of RNN / LSTM or networks that work on sequential ( temporal ) data, CNNs work on spatial data there this distinction does not exist. So many always involves multiple timesteps / a sequence

The different species describe the shape of input and output and its classification. For the input one means a single input quantity is classified as a closed quantity and many means a sequence of quantities ( i.e. sequence of images, sequence of words) is classified as a closed quantity. For the output one means the output is a scalar ( binary classification i.e. is a bird or is not a bird ) 0 or 1, many means output is a one-hot encoded vector with one dimension for each class ( multiclass classification i.e. is a sparrow, is a robin, ... ), for i.e. three classes 001, 010, 100 :

In the following example images and sequences of images are used as quantity that shall be classified, alternatively you could use words or ... and sequences of words ( sentences ) or ... :

one-to-one : single images ( or words,... ) are classified in single class ( binary classification ) i.e. is this a bird or not

one-to-many : single images ( or words,... ) are classified in multiple classes

many-to-one : sequence of images ( or words, ... ) is classified in single class ( binary classification of a sequence )

many-to-many : sequence of images ( or words, ... ) is classified in multiple classes

cf https://www.quora.com/How-can-I-choose-between-one-to-one-one-to-many-many-to-one-many-to-one-and-many-to-many-in-long-short-term-memory-LSTM


one-to-one ( activation=sigmoid ( default ) loss=mean_squared_error )

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
# prepare sequence
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(len(seq), 1, 1)
y = seq.reshape(len(seq), 1)
# define LSTM configuration
n_neurons = length
n_batch = length
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(1, 1)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result:
    print('%.1f' % value)

source : https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/


one-to-many uses RepeatVector() to transform single quantities into a sequence what is needed for multiclass classification

def test_one_to_many(self):
        params = dict(
            input_dims=[1, 10], activation='tanh',
            return_sequences=False, output_dim=3
        ),
        number_of_times = 4
        model = Sequential()
        model.add(RepeatVector(number_of_times, input_shape=(10,)))

        model.add(LSTM(output_dim=params[0]['output_dim'],
                       activation=params[0]['activation'],
                       inner_activation='sigmoid',
                       return_sequences=True,
                       ))
        relative_error, keras_preds, coreml_preds = simple_model_eval(params, model)
        # print relative_error, '\n', keras_preds, '\n', coreml_preds, '\n'
        for i in range(len(relative_error)):
            self.assertLessEqual(relative_error[i], 0.01) 

source: https://www.programcreek.com/python/example/89689/keras.layers.RepeatVector

alternative one-to-many

model.add(RepeatVector(number_of_times, input_shape=input_shape))
model.add(LSTM(output_size, return_sequences=True))

source : Many to one and many to many LSTM examples in Keras


many-to-one, binary classification (loss=binary_crossentropy, activation=sigmoid, dimensionality of fully-connected ouput layer is 1 (Dense(1)), outputs a scalar, 0 or 1 )

model = Sequential()
model.add(Embedding(5000, 32, input_length=500))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])        
print(model.summary())
model.fit(X_train, y_train, epochs=3, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)

many-to-many, multiclass classification ( loss=sparse_categorial_crossentropy , activation=softmax, needs one-hot encoding of target, ground truth data, dimensionality of fully-connected ouput layer is 7 (Dense71)) outputs a 7-dimensional vector in that the 7 classes are one-hot encoded )

from keras.models import Sequential
from keras.layers import *

model = Sequential()
model.add(Embedding(5000, 32, input_length=500))
model.add(LSTM(100, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(7, activation='softmax'))
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

model.summary()

cf Keras LSTM multiclass classification

Alternative many-to-many using TimeDistributed layer cf https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/ for description

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import TimeDistributed
from keras.layers import LSTM
# prepare sequence
length = 5
seq = array([i/float(length) for i in range(length)])
X = seq.reshape(1, length, 1)
y = seq.reshape(1, length, 1)
# define LSTM configuration
n_neurons = length
n_batch = 1
n_epoch = 1000
# create LSTM
model = Sequential()
model.add(LSTM(n_neurons, input_shape=(length, 1), return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(loss='mean_squared_error', optimizer='adam')
print(model.summary())
# train LSTM
model.fit(X, y, epochs=n_epoch, batch_size=n_batch, verbose=2)
# evaluate
result = model.predict(X, batch_size=n_batch, verbose=0)
for value in result[0,:,0]:
    print('%.1f' % value)

source : https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/

ralf htp
  • 9,149
  • 4
  • 22
  • 34
  • Thanks for the response! Could you clarify what you mean by "closed quantity"? – fade2black Sep 02 '18 at 18:57
  • Also, I don't think that it is the layer Dense that makes the first example to be "one-to-one". Is it because the input shape is (1,1)? The NN would successfully output a single scalar even without the layer Dense. My question is not about how and where to use these species of networks (images, words, etc., though useful knowledge), but what makes LSTM NN one-to-one, many-to-one. Input shape? Output shape? In particular, in one-to-one example, will it be still one-to-one if I change the input shape from (1,1) to (1,2) or (2,1)? – fade2black Sep 02 '18 at 19:14
  • Dense is always fully-connected layer, this is why it is used in almost any model as output layer. *Closed quantity* means *as a unit*, either you choose a single quantity as a unit or a multiple of quantities as a unit. You choose the architecture depending on the problem, i.e. if you want to classify sentences in multiple classes you choose many to many, if you want to detect if a single image contains a bird or not you choose one-to-one. If you change the input shape the network can still be one-to-one however you are not allowed the number of timesteps, this has to be fixed to 1. – ralf htp Sep 02 '18 at 20:03
  • I have noticed by experimenting that if define LSTM(4) then it outputs a vector of 4 scalars, but when we connect it to a Dense then the whole network outputs a single value because of the Dense layer. Alternatively we could define LSTM(1) with input shape (1,1) if we want a single input and a single output, without any other layer. Right? – fade2black Sep 02 '18 at 20:10
  • As to one-to-many I could define a single layer LSTM(3) with input shape (1,1). In this case the NN would accept a single scalar as an input and output 3 scalars as output as a single vector. Is this one input and many outputs? – fade2black Sep 02 '18 at 20:16
  • Only if you specify Dense(1). Dense is fully-connected. Fully-connected is a special layer type, search net for this. In case of *many* output the vector is always one-hot encoded – ralf htp Sep 02 '18 at 20:24
  • I thought one or many depends ONLY on the number of scalar s in a vector. Not on if a vector is one hot or dense. I am confused more than before. In that article I mention in post there is no reference to other types of layers (Dense, RepeatedVectors, etc) and these species of LSTM networks are explained purly on LSTM itself. – fade2black Sep 02 '18 at 20:38
  • yes *one* or *many* depends on the dimensionality ( number of scalars ) of the output vector. the point is that the output vector always has to be one-hot encoded in RNN \ LSTM for classification . This does not contradict with the use of Dense layer – ralf htp Sep 03 '18 at 04:43