2

Although I have good understanding of beam search but I have a query regarding beam search. When we select n best paths should we sort them or simply we should keep them in the order in which they exist and just discard other expensive nodes?

I searched a lot about this but every where it says that keep best. Nothing is found about should we sort them or not?

I think that we should sort them because by applying sorting we will reach to goal node quickly. But I want confirmation of my sorting idea and I did not found it till now.

I will be thankful to you if you can help me in improving my concepts.

Hammad Hassan
  • 1,192
  • 17
  • 29

2 Answers2

1

When we select n best paths should we sort them or simply we should keep them in the order in which they exist and just discard other expensive nodes?

We just sort them and keep the top k.

At each step after the initialization you sort the beam_size * vocabulary_size hypotheses and choose the top k. For each of the beam_size * vocabulary_size hypotheses, its weight/probability is the product of all probabilities along its history normalized by the length(length normalization).

One problem arises from the fact that the completed hypotheses may have different lengths. Because models generally assign lower probabilities to longer strings, a naive algorithm would also choose shorter strings for y. This was not an issue during the earlier steps of decoding; due to the breadth-first nature of beam search all the hypotheses being compared had the same length. The usual solution to this is to apply some form of length normalization to each of the hypotheses, for example simply dividing the negative log probability by the number of words: enter image description here

For more information please refer to this answer.

Reference: https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf

Lerner Zhang
  • 6,184
  • 2
  • 49
  • 66
-1

****Beam search uses breadth-first search to build its search tree. At each level of the tree, it generates all successors of the states at the current level, ***

sorting them in increasing order of heuristic cost

***. However, it only stores a predetermined number of best states at each level (called the beam width). Only those states are expanded next. The greater the beam width, the fewer states are pruned. With an infinite beam width, no states are pruned and beam search is identical to breadth-first search. NOTE: (I got this information from WikipediA during my search.)may be its helpful.****

hum_Na
  • 1
  • 1