I'm trying to get an m4a
file transcribed. I'm receiving this file at a FastAPI endpoint and then attempting to send it to OpenAI's transcribe
but it seems like the format/shape is off. How can I turn the UploadFile into something that OpenAI will accept? The OpenAI docs for transcribe are essentially:
The transcriptions API takes as input the audio file you want to transcribe and the desired output file format for the transcription of the audio. We currently support multiple input and output file formats.
Here's my current code:
@app.post("/transcribe")
async def transcribe_audio_file(file: UploadFile = File(...)):
contents = await file.read()
contents_str = contents.decode()
buffer = io.StringIO(contents_str)
transcript_response = openai.Audio.transcribe("whisper-1", buffer)
I've modified the above code to several different scenarios, which return the respective errors:
transcript_response = openai.Audio.transcribe("whisper-1", file) # AttributeError: 'UploadFile' object has no attribute 'name'
transcript_response = openai.Audio.transcribe("whisper-1", contents) # AttributeError: 'bytes' object has no attribute 'name'
transcript_response = openai.Audio.transcribe("whisper-1", contents_str) # UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 13: invalid start byte
transcript_response = openai.Audio.transcribe("whisper-1", buffer) # UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 13: invalid start byte
I have something similar working in a vanilla CLI python script that looks like this:
audio_file = open("./audio-file.m4a", "rb")
transcript_response = openai.Audio.transcribe("whisper-1", audio_file)
So I also tried using a method like that:
with open(file.filename, "rb") as audio_file:
transcript = openai.Audio.transcribe("whisper-1", audio_file)
But that gave the error:
FileNotFoundError: [Errno 2] No such file or directory: '6ad52ad0-2fce-4d79-b4ac-e154379ceacd'
Any tips on how to debug this myself are also welcome. I'm coming from TypeScript land.