4

How can I transcribe a speech file with the Bing Speech API in Python? My speech file is longer than 15 seconds.


I'm aware that one may use the Bing Speech REST API in Python. https://gist.github.com/jellis505/973ea6de12508c7c720da4a074e7d065 gives an example in Python 2:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import requests
import httplib
import uuid
import json

class Microsoft_ASR():
    def __init__(self):
        self.sub_key = 'YourKeyHere'
        self.token = None
        pass

    def get_speech_token(self):
        FetchTokenURI = "/sts/v1.0/issueToken"
        header = {'Ocp-Apim-Subscription-Key': self.sub_key}
        conn = httplib.HTTPSConnection('api.cognitive.microsoft.com')
        body = ""
        conn.request("POST", FetchTokenURI, body, header)
        response = conn.getresponse()
        str_data = response.read()
        conn.close()
        self.token = str_data
        print "Got Token: ", self.token
        return True

    def transcribe(self,speech_file):

        # Grab the token if we need it
        if self.token is None:
            print "No Token... Getting one"
            self.get_speech_token()

        endpoint = 'https://speech.platform.bing.com/recognize'
        request_id = uuid.uuid4()
        # Params form Microsoft Example 
        params = {'scenarios': 'ulm',
                  'appid': 'D4D52672-91D7-4C74-8AD8-42B1D98141A5',
                  'locale': 'en-US',
                  'version': '3.0',
                  'format': 'json',
                  'instanceid': '565D69FF-E928-4B7E-87DA-9A750B96D9E3',
                  'requestid': uuid.uuid4(),
                  'device.os': 'linux'}
        content_type = "audio/wav; codec=""audio/pcm""; samplerate=16000"

        def stream_audio_file(speech_file, chunk_size=1024):
            with open(speech_file, 'rb') as f:
                while 1:
                    data = f.read(1024)
                    if not data:
                        break
                    yield data

        headers = {'Authorization': 'Bearer ' + self.token, 
                   'Content-Type': content_type}
        resp = requests.post(endpoint, 
                            params=params, 
                            data=stream_audio_file(speech_file), 
                            headers=headers)
        val = json.loads(resp.text)
        return val["results"][0]["name"], val["results"][0]["confidence"]

if __name__ == "__main__":
    ms_asr = Microsoft_ASR()
    ms_asr.get_speech_token()
    text, confidence = ms_asr.transcribe('Your Wav File Here')
    print "Text: ", text
    print "Confidence: ", confidence

However, the Bing Speech REST API cannot convert audio files longer than 15 seconds according to https://learn.microsoft.com/en-us/azure/cognitive-services/speech/home:

enter image description here

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501

2 Answers2

0

you can convert large files upto the extent of 10 min using bing speech but you need to build a websocket for it as it the other alternative within bing for large audio files. Here is the github repo bing speech

jatin singh
  • 123
  • 1
  • 1
  • 13
0

You can also use batch transcription with larger files.

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription

The files need to be stored on Azure Blob Storage first.

korakot
  • 37,818
  • 16
  • 123
  • 144