Hello Community Members,
I would like to output the 1000 most frequently used words with frequency from a Gensim Word2Vec model. However, I am not interested in certain words, which I therefore filter using numpy (np.stdiff1d).After that I create a new list using '/n'.join, but now I have the problem that every time I call an entry from the list '/n'.join is entered in front of the word (e.g. instead of house /nhouse), so I get a key error.
I tried to work around it by saving the list (corpus_words) as .txt and “open with“, but even then, there is a /n in front of each entry, when I try to get the frequency of the word.
to use a print statement beforer "/n".join(new_list) did not help either.
is there any way to fix this?
Model_Pfad = r'D:\OneDrive\Phyton\modelC.model'
ausgabe= open('D:\OneDrive\Phyton\wigbelsZahlen.txt', 'w')
model = Word2Vec.load(Model_Pfad)
x = list(model.wv.index_to_key[:1000])
stop_words = set (["an",
'as',
'art',
'ab',
'al',
"aber",
"abk.",
"alle",
"allem",
"allen",
"aller",
"alles",
"allg."
])
new_list = [item for item in x if item not in stop_words]
for i in new_list:
result = model.wv.get_vecattr(i, "count")
ausgabe.write(i + '\t' + str(result))
ausgabe.write('\n')
ausgabe.close