2

I'm trying to get an m4a file transcribed. I'm receiving this file at a FastAPI endpoint and then attempting to send it to OpenAI's transcribe but it seems like the format/shape is off. How can I turn the UploadFile into something that OpenAI will accept? The OpenAI docs for transcribe are essentially:

The transcriptions API takes as input the audio file you want to transcribe and the desired output file format for the transcription of the audio. We currently support multiple input and output file formats.

Here's my current code:

@app.post("/transcribe")
async def transcribe_audio_file(file: UploadFile = File(...)):
    contents = await file.read()
    contents_str = contents.decode()
    buffer = io.StringIO(contents_str)

    transcript_response = openai.Audio.transcribe("whisper-1", buffer)

I've modified the above code to several different scenarios, which return the respective errors:

    transcript_response = openai.Audio.transcribe("whisper-1", file) # AttributeError: 'UploadFile' object has no attribute 'name'
    transcript_response = openai.Audio.transcribe("whisper-1", contents) # AttributeError: 'bytes' object has no attribute 'name'
    transcript_response = openai.Audio.transcribe("whisper-1", contents_str) # UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 13: invalid start byte
    transcript_response = openai.Audio.transcribe("whisper-1", buffer) # UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 13: invalid start byte

I have something similar working in a vanilla CLI python script that looks like this:

audio_file = open("./audio-file.m4a", "rb")
transcript_response = openai.Audio.transcribe("whisper-1", audio_file)

So I also tried using a method like that:

    with open(file.filename, "rb") as audio_file:
        transcript = openai.Audio.transcribe("whisper-1", audio_file)

But that gave the error:

FileNotFoundError: [Errno 2] No such file or directory: '6ad52ad0-2fce-4d79-b4ac-e154379ceacd'

Any tips on how to debug this myself are also welcome. I'm coming from TypeScript land.

Brady Dowling
  • 4,920
  • 3
  • 32
  • 62

1 Answers1

1

As mentioned in this question, the solution is as follows:

@app.post("/transcribe")
async def transcribe_audio_file(file: UploadFile = File(...)):
    audio = await file.read()
    buffer = io.BytesIO(audio)
    buffer.name = 'audio.m4a' # pretty sure any string here will do
    transcript_response = openai.Audio.transcribe("whisper-1", buffer)
    return transcript_response
Brady Dowling
  • 4,920
  • 3
  • 32
  • 62
  • Please remember to close the `buffer` afterwards, as described in [this answer](https://stackoverflow.com/a/73586180/17865804). You might find [this](https://stackoverflow.com/a/70665801/17865804) and [this](https://stackoverflow.com/a/70653605/17865804) helpful as well. – Chris Jun 29 '23 at 04:45