I am trying to use the transcribe method from OpenAI's whisper python module without loading the audio file from a file system. In my code I have downloaded an ogg audio file from a matrix server repository and now want to transcribe that. whisper.transcribe only wants a file, np.array or Tensor as input. Trying just to convert byte to either np.array or Tensor fails as the new arrays seems to be missing vital info. I was wondering if I can use some other code inside the whisper api to achieve what I want without writing my byte to file first and then later read it back from the file.
import whisper
# this gets an ogg file from a matrix server via mxc:// url as byte
audio = await self.client.download_media(evt.content.url)
model = load_model("base")
with open("my_file.ogg", "wb") as f:
f.write(audio)
result = model.transcribe("my_file.ogg")
this code works but looks not the like finest programming idea, more like a quick hack; works but is ugly. So I wondering if there is a better option.