5

Given a state vector we can recursively decode a sequence in a greedy manner by generating each output successively, where each prediction is conditioned on the previous output. I read a paper recently that described using beam search during decoding with a beam size of 1 (k=1). If we are only retaining the best output at each step, isn't this just the same thing as greedy decoding, and offers none of the benefits typically provided by beam search?

jstaker7
  • 1,226
  • 4
  • 15
  • 29

1 Answers1

5

Finally found an answer: beam size of 1 is the same as greedy search.

From "Abstractive Sentence Summarization with Attentive Recurrent Neural Networks":

"k refers to the size of the beam for generation; k = 1 implies greedy generation."
jstaker7
  • 1,226
  • 4
  • 15
  • 29