0

I am working on natural language processing project with deep learning and I downloaded a word embedding file. The file is in .bin format. I can open that file with

file = open("cbow.bin", "rb")

But when I type

file.read(100)

I get

b'4347907 300\n</s> H\xe1\xae:0\x16\xc1:\xbfX\xa7\xbaR8\x8f\xba\xa0\xd3\xee9K\xfe\x83::m\xa49\xbc\xbb\x938\xa4p\x9d\xbat\xdaA:UU\xbe\xba\x93_\xda9\x82N\x83\xb9\xaeG\xa7\xb9\xde\xdd\x90\xbaww$\xba\xfdba:\x14.\x84:R\xb8\x81:0\x96\x0b:\x96\xfc\x06'  

What is this language and How can I convert it into actual numbers and text using python?

floyd
  • 1,431
  • 2
  • 12
  • 18

1 Answers1

1

This weird language you are referring to is a python bytestring.

As @jolitti implied that you won't be able to convert this particular bytestring to readable text.

If the bytestring contained any characters you recognize then would have been displayed like this.

b'Guido van Rossum'
Nikhil Devadiga
  • 428
  • 2
  • 9
  • So, Do I need to contact the authors of the file? How would they help me? – floyd Mar 06 '22 at 13:18
  • 1
    Yes, please ask them. The file you have is just a stream of bytes and not something to be parsed. I don't understand what word embedding file is. But reading a short summary on word embedding, I would guess this is a trained model and you would have to _load_ this file to use it. – Nikhil Devadiga Mar 06 '22 at 13:37