Questions tagged [lstm]

Long short-term memory. A neural network (NN) architecture that contains recurrent NN blocks that can remember a value for an arbitrary length of time. A very popular building block for deep NN.

Long short-term memory neural networks (LSTMs) are a subset of recurrent neural networks. They can take time-series data and make predictions using knowledge of how the system is evolving.

A major benefit to LSTMs is their ability to store and utilize long-term information, not just what they are provided at a particular instance. For more information on LSTMs check out these links from colah's blog post and MachineLearningMastery.

6289 questions
406
votes
4 answers

Understanding Keras LSTMs

I am trying to reconcile my understand of LSTMs and pointed out here in this post by Christopher Olah implemented in Keras. I am following the blog written by Jason Brownlee for the Keras tutorial. What I am mainly confused about is, The reshaping…
sachinruk
  • 9,571
  • 12
  • 55
  • 86
152
votes
17 answers

Tensorflow - ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float)

Continuation from previous question: Tensorflow - TypeError: 'int' object is not iterable My training data is a list of lists each comprised of 1000 floats. For example, x_train[0] = [0.0, 0.0, 0.1, 0.25, 0.5, ...] Here is my model: model =…
SuperHanz98
  • 2,090
  • 2
  • 16
  • 33
109
votes
5 answers

What's the difference between "hidden" and "output" in PyTorch LSTM?

I'm having trouble understanding the documentation for PyTorch's LSTM module (and also RNN and GRU, which are similar). Regarding the outputs, it says: Outputs: output, (h_n, c_n) output (seq_len, batch, hidden_size * num_directions): tensor…
N. Virgo
  • 7,970
  • 11
  • 44
  • 65
102
votes
2 answers

What is the intuition of using tanh in LSTM?

In an LSTM network (Understanding LSTMs), why does the input gate and output gate use tanh? What is the intuition behind this? It is just a nonlinear transformation? If it is, can I change both to another activation function (e.g., ReLU)?
99
votes
2 answers

Keras: the difference between LSTM dropout and LSTM recurrent dropout

From the Keras documentation: dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs. recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the…
Alonzorz
  • 2,113
  • 4
  • 18
  • 21
98
votes
5 answers

What's the difference between a bidirectional LSTM and an LSTM?

Can someone please explain this? I know bidirectional LSTMs have a forward and backward pass but what is the advantage of this over a unidirectional LSTM? What is each of them better suited for?
97
votes
4 answers

How to stack multiple lstm in keras?

I am using deep learning library keras and trying to stack multiple LSTM with no luck. Below is my code model = Sequential() model.add(LSTM(100,input_shape =(time_steps,vector_size))) model.add(LSTM(100)) The above code returns error in the third…
Tamim Addari
  • 7,591
  • 9
  • 40
  • 59
86
votes
4 answers

In Keras, what exactly am I configuring when I create a stateful `LSTM` layer with N `units`?

The first arguments in a normal Dense layer is also units, and is the number of neurons/nodes in that layer. A standard LSTM unit however looks like the following: (This is a reworked version of "Understanding LSTM Networks") In Keras, when I…
André C. Andersen
  • 8,955
  • 3
  • 53
  • 79
64
votes
2 answers

How to use return_sequences option and TimeDistributed layer in Keras?

I have a dialog corpus like below. And I want to implement a LSTM model which predicts a system action. The system action is described as a bit vector. And a user input is calculated as a word-embedding which is also a bit vector. t1: user: "Do you…
jef
  • 3,890
  • 10
  • 42
  • 76
62
votes
7 answers

Tensorflow Data Adapter Error: ValueError: Failed to find data adapter that can handle input

While running a sentdex tutorial script of a cryptocurrency RNN, link here YouTube Tutorial: Cryptocurrency-predicting RNN Model, but have encountered an error when attempting to train the model. My tensorflow version is 2.0.0 and I'm running python…
Jonathan E
  • 631
  • 1
  • 5
  • 5
62
votes
4 answers

How do I create a variable-length input LSTM in Keras?

I am trying to do some vanilla pattern recognition with an LSTM using Keras to predict the next element in a sequence. My data look like this: where the label of the training sequence is the last element in the list:…
erip
  • 16,374
  • 11
  • 66
  • 121
59
votes
11 answers

What is num_units in tensorflow BasicLSTMCell?

In MNIST LSTM examples, I don't understand what "hidden layer" means. Is it the imaginary-layer formed when you represent an unrolled RNN over time? Why is the num_units = 128 in most cases ?
Subrat
  • 980
  • 2
  • 11
  • 17
56
votes
5 answers

When does keras reset an LSTM state?

I read all sorts of texts about it, and none seem to answer this very basic question. It's always ambiguous: In a stateful = False LSTM layer, does keras reset states after: Each sequence; or Each batch? Suppose I have X_train shaped as…
Daniel Möller
  • 84,878
  • 18
  • 192
  • 214
53
votes
5 answers

What is the difference between CuDNNLSTM and LSTM in Keras?

In Keras, the high-level deep learning library, there are multiple types of recurrent layers; these include LSTM (Long short term memory) and CuDNNLSTM. According to the Keras documentation, a CuDNNLSTM is a: Fast LSTM implementation backed by…
krismath
  • 1,879
  • 2
  • 23
  • 41
48
votes
3 answers

Keras - stateful vs stateless LSTMs

I'm having a hard time conceptualizing the difference between stateful and stateless LSTMs in Keras. My understanding is that at the end of each batch, the "state of the network is reset" in the stateless case, whereas for the stateful case, the…
vgoklani
  • 10,685
  • 16
  • 63
  • 101
1
2 3
99 100