Questions tagged [deep-learning]

Deep Learning is an area of machine learning whose goal is to learn complex functions using special neural network architectures that are "deep" (consist of many layers). This tag should be used for questions about implementation of deep learning architectures. General machine learning questions should be tagged "machine learning". Including a tag for the relevant software library (e.g., "keras", "tensorflow","pytorch","fast.ai" etc) is helpful.

Deep Learning is a branch of aimed at building to learn complex functions using special neural network architectures with many layers (hence the term "deep").

Deep neural network architectures allow for more complex tasks to be learned because, in addition to these neural networks having more layers to perform transformations, the larger number of layers and more complex architectures of the neural network allow a hierarchical organization of functionality to emerge.

Deep Learning was introduced into machine learning research with the intention of moving machine learning closer to artificial intelligence. A significant impact of deep learning lies in feature learning, mitigating much of the effort going into manual feature engineering in non-deep learning neural networks.

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead; otherwise your question is probably off-topic. Please choose one site only and do not cross-post to more than one - see Is cross-posting a question on multiple Stack Exchange sites permitted if the question is on-topic for each site? (tl;dr: no).

Resources

Papers

Books

Videos

Stack Exchange Sites

Other StackExchange sites with Deep Learning tag:

27406 questions
476
votes
14 answers

Epoch vs Iteration when training neural networks

What is the difference between epoch and iteration when training a multi-layer perceptron?
432
votes
16 answers

What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow?

What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool of tensorflow? In my opinion, 'VALID' means there will be no zero padding outside the edges when we do max pool. According to A guide to convolution arithmetic for deep…
karl_TUM
  • 5,769
  • 10
  • 24
  • 41
429
votes
10 answers

What is the meaning of the word logits in TensorFlow?

In the following TensorFlow function, we must feed the activation of artificial neurons in the final layer. That I understand. But I don't understand why it is called logits? Isn't that a mathematical function? loss_function =…
414
votes
3 answers

Keras input explanation: input_shape, units, batch_size, dim, etc

For any Keras layer (Layer class), can someone explain how to understand the difference between input_shape, units, dim, etc.? For example the doc says units specify the output shape of a layer. In the image of the neural net below hidden layer1…
scarecrow
  • 6,624
  • 5
  • 20
  • 39
406
votes
4 answers

Understanding Keras LSTMs

I am trying to reconcile my understand of LSTMs and pointed out here in this post by Christopher Olah implemented in Keras. I am following the blog written by Jason Brownlee for the Keras tutorial. What I am mainly confused about is, The reshaping…
sachinruk
  • 9,571
  • 12
  • 55
  • 86
376
votes
10 answers

How do I save a trained model in PyTorch?

How do I save a trained model in PyTorch? I have read that: torch.save()/torch.load() is for saving/loading a serializable object. model.state_dict()/model.load_state_dict() is for saving/loading model state.
Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161
331
votes
7 answers

Why do we need to call zero_grad() in PyTorch?

Why does zero_grad() need to be called during training? | zero_grad(self) | Sets gradients of all model parameters to zero.
user1424739
  • 11,937
  • 17
  • 63
  • 152
271
votes
3 answers

How to interpret loss and accuracy for a machine learning model

When I trained my neural network with Theano or Tensorflow, they will report a variable called "loss" per epoch. How should I interpret this variable? Higher loss is better or worse, or what does it mean for the final performance (accuracy) of my…
254
votes
10 answers

How do I initialize weights in PyTorch?

How do I initialize weights and biases of a network (via e.g. He or Xavier initialization)?
Fábio Perez
  • 23,850
  • 22
  • 76
  • 100
246
votes
4 answers

What does model.eval() do in pytorch?

When should I use .eval()? I understand it is supposed to allow me to "evaluate my model". How do I turn it back off for training? Example training code using .eval().
Gulzar
  • 23,452
  • 27
  • 113
  • 201
229
votes
13 answers

Keras, How to get the output of each layer?

I have trained a binary classification model with CNN, and here is my code model = Sequential() model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1], border_mode='valid', …
GoingMyWay
  • 16,802
  • 32
  • 96
  • 149
215
votes
12 answers

Why binary_crossentropy and categorical_crossentropy give different performances for the same problem?

I'm trying to train a CNN to categorize text by topic. When I use binary cross-entropy I get ~80% accuracy, with categorical cross-entropy I get ~50% accuracy. I don't understand why this is. It's a multiclass problem, doesn't that mean that I have…
195
votes
4 answers

Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks

Can anyone please clearly explain the difference between 1D, 2D, and 3D convolutions in convolutional neural networks (in deep learning) with the use of examples?
192
votes
13 answers

Why must a nonlinear activation function be used in a backpropagation neural network?

I've been reading some things on neural networks and I understand the general principle of a single layer neural network. I understand the need for aditional layers, but why are nonlinear activation functions used? This question is followed by this…
corazza
  • 31,222
  • 37
  • 115
  • 186
186
votes
10 answers

What is the role of "Flatten" in Keras?

I am trying to understand the role of the Flatten function in Keras. Below is my code, which is a simple two-layer network. It takes in 2-dimensional data of shape (3, 2), and outputs 1-dimensional data of shape (1, 4): model =…
Karnivaurus
  • 22,823
  • 57
  • 147
  • 247
1
2 3
99 100