0

I have trained a word2vec model using gensim. In the models matrix some values' floating point looks like this: "-7.18556e-05"

I need to use the values on the matrix as a string. Is there a way to remove those "e-05","e-04" etc.?

import nltk
from gensim.models import Word2Vec
from nltk.corpus import stopwords

text = "My text is here"
sentences = nltk.sent_tokenize(text)
for i in range(len(sentences)):
    sentences[i] = [word for word in sentences[i] if word not in stopwords.words('english')]

model = Word2Vec(sentences, min_count=1)

words = model.wv.vocab

for word in words:
    matrix = model.wv[words.keys()]
  • 1
    it is only string displayed by Python - originally it keeps it as float value in memory. Using string formatting you can format it in different way - https://pyformat.info/ – furas May 17 '20 at 14:11
  • BTW: `print("{:.5f}".format(-7.18556e-05))` gives `-0.00007` and `print("{:.10f}".format(-7.18556e-05))` gives `-0.0000718556` – furas May 17 '20 at 14:16
  • While the formatting suggestions will work, why do you ned to use those values as a string? (That exponent-notation is in fact a string, and is understood by Python & many reading-routines, & even many human readers. So knowing specifically the exact purpose of your intended string-representation will allow the right suggestions for formatting – or even identification of situations where the existing representation is more OK than you might think.) – gojomo May 17 '20 at 20:43

1 Answers1

0

Note that those scientific-notation printouts are valid strings, & will be understood by Python & many reading routines that might be used on your output.

And, when printing for some very specific purpose, there are various formatting options (including the .format() options mentioned by comments) to get exactly what you need. (You haven't shown what methods of triggering matrix/array display you're currently using, so it's not clear what suggestions for altering the display, at the key output points, are best.)

But also: all the vectors/matrixes from gensim and most similar libraries are typically provided by numpy, which has a global setting to alter display options, including a suppress parameter for completely stopping such notation. See this other answer for more details:

https://stackoverflow.com/a/2891805/130288

Ultimately, you may not want to rely on this being set, at some prior time & globally, to get your desired output at one specific intentional place. It'd be clearer, more robust code to explicitly format the results for the purpose. But as a quick fix, the above may fit your need.

gojomo
  • 52,260
  • 14
  • 86
  • 115