How do "de-embed" words in TensorFlow

Question

I am trying to follow the tutorial for Language Modeling on the TensorFlow site. I see it runs and the cost goes down and it is working great, but I do not see any way to actually get the predictions from the model. I tried following the instructions at this answer but the tensors returned from session.run are floating point values like 0.017842259, and the dictionary maps words to integers so that does not work.

How can I get the predicted word from a tensorflow model?

Edit: I found this explanation after searching around, I just am not sure what x and y would be in the context of this example. They don't seem to use the same conventions for this example as they do in the explanation.

Prophecies · Accepted Answer · 2016-10-18T21:16:33.137

1

The tensor you are mentioning is the loss, which defines how the network is training. For prediction, you need to access the tensor probabilities which contain the probabilities for the next word. If this was classification problem, you'd just do argmax to get the top probability. But, to also give lower probability words a chance of being generated,some kind of sampling is often used.

Edit: I assume the code you used is this. In that case, if you look at line 148 (logits) which can be converted into probabilities by simply applying the softmax function to it -- like shown in the pseudocode in tensorflow website. Hope this helps.

edited Oct 18 '16 at 21:16

answered Oct 17 '16 at 19:31

Prophecies

723
1
7
19

Not 100% sure what you mean by this. The result of `session.run` is a 3 element list with the elements being as follows: cost: a float32 that I assume is the average cost of the model, eval_op: the function used to evaluate the model, and final_state: a list of LSTMStateTuples – jbird Oct 17 '16 at 20:17
Which one of those contains the probabilities? – jbird Oct 17 '16 at 20:17
One more thing to mention, these LSTMStateTuples contain negative values so I am assuming they are not probabilities (but also I'm not sure how something would have negative loss) – jbird Oct 17 '16 at 20:18
It does not necessarily have to be one of the outputs of the network. In the example you have linked, there is a probabilities tensor. You can access the values of that tensor [multiple ways](http://stackoverflow.com/questions/33633370/how-to-print-the-value-of-a-tensor-object-in-tensorflow) – Prophecies Oct 17 '16 at 21:03
From your edit, which talks about simple MNIST example, I feel like it would help you a lot to go through the basic mechanics of tensorflow first. Tensorflow has very good documentation. – Prophecies Oct 17 '16 at 21:05
So I think you answered the wrong question there. I'm asking how to access the probabilities tensor. I can print the values from it once I have it, I just don't know where it is since session.run only has `cost`, `eval_op` and `final_state`. So, how do I get probabilities from these variables? – jbird Oct 18 '16 at 13:28
Ah, I think I see where you are getting confused. In the documentation, it includes the `probabilities` variable in the pseudocode but if you look at the actual code on git, there is no `probabilities` variable – jbird Oct 18 '16 at 14:27
Ahh, but the code posted is also pretty similar. It has `logits` in there which is just a [softmax](https://en.wikipedia.org/wiki/Softmax_function) function away from getting the probs. See my edit. – Prophecies Oct 18 '16 at 21:17
I see what you mean there. I did not make that connection beforehand, but that is really helpful and now I see how I can make this work with the examples that are there. Thank you! – jbird Oct 19 '16 at 13:55

score 0 · Answer 2 · edited May 23 '17 at 12:06

So after going through a bunch of other similar posts I figured this out. First, the code explained in the documentation is not the same as the code on the GitHub repository. The current code works by initializing models with data inside instead of passing data to the model as it goes along.

So basically to accomplish what I was trying to do, I reverted my code to commit 9274f5a (also do the same for reader.py). Then I followed the steps taken in this post to get the probabilities tensor in my run_epoch function. Additionally, I followed this answer to pass the vocabulary to my main function. From there, I inverted the dict using vocabulary = {v: k for k, v in vocabulary.items()} and passed it to run_epoch.

Finally, we can get the predicted word in run_epoch by running current_word = vocabulary[np.argmax(prob, 1)] where prob is the tensor returned from session.run()

Edit: Reverting the code as such should not be a permanent solution and I definitely recommend using @Prophecies answer above to get the probabilities tensor. However, if you want to get the word mapping, you will need to pass the vocabulary as I did here.

How do "de-embed" words in TensorFlow

2 Answers2