Speechmatics submit a job without audio argument

Question

I have implemented a SpeechMatics speech to text application with their API as given in this document with the code below :

from speechmatics.models import ConnectionSettings
from speechmatics.batch_client import BatchClient
from httpx import HTTPStatusError 

API_KEY = "YOUR_API_KEY"
PATH_TO_FILE = "example.wav"
LANGUAGE = "en"

settings = ConnectionSettings(
    url="https://asr.api.speechmatics.com/v2",
    auth_token=API_KEY,
)

# Define transcription parameters
conf = {
    "type": "transcription",
    "transcription_config": { 
        "language": LANGUAGE 
    }
}

# Open the client using a context manager
with BatchClient(settings) as client:
    try:
        job_id = client.submit_job(
            audio=PATH_TO_FILE,
            transcription_config=conf,
        )
        print(f'job {job_id} submitted successfully, waiting for transcript')

        # Note that in production, you should set up notifications instead of polling. 
        # Notifications are described here: https://docs.speechmatics.com/features-other/notifications
        transcript = client.wait_for_completion(job_id, transcription_format='txt')
        # To see the full output, try setting transcription_format='json-v2'.
        print(transcript)
    except HTTPStatusError:
        print('Invalid API key - Check your API_KEY at the top of the code!')

The code uses a file as an argument for the submit_job function. I want to submit a job, with fetch_data that uses a URL instead of a local file.

However, the submit_job function requires an audio argument.

I just want to use fetch_data option as given here and no audio argument as given below :

conf = {
  "type": "transcription",
  "transcription_config": {
    "language": "en",
    "diarization": "speaker"
  },
  "fetch_data": {
    "url": "${URL}/{FILENAME}"
  }
}

How can I use fetch_data configuration that is given above and able to use submit_job function without an audio file as an argument ?

Tudor Evans · Accepted Answer · 2023-06-19T09:16:31.717

Unfortunaly, I don't think the speechmatics python client currently supports using the fetch_data feature. I'm a senior software engineer at Speechmatics, and this is a known problem that we're looking into.

It is possible to send the fetch_data to the server with an empty audio file, but it gets rejected with a 400 error as it can't accept both inputs at once, so for now there is no solution that uses the SDK.

However, the SDK is really just a thin wrapper around the RESTful API. It is possible to write a simple python script that uses the requests module to achieve the same thing. I wrote the below script and tested it against a wikimedia audio file and it worked okay.

It just sends a basic http post request, then uses the job_id to poll for the job status until the status is finished running. Then it gets the transcript (which will default to json format) and prints it out (as a raw string, not json - but can be converted to json with json.loads()). Here's the code:

import requests
import json
import time

API_KEY = "YOUR_API_KEY"
LANGUAGE = "en"
AUDIO_URL = "YOUR_URL"

conf = {
    "type": "transcription",
    "transcription_config": {"language": LANGUAGE, "diarization": "speaker"},
    "fetch_data": {"url": AUDIO_URL},
}

response = requests.post(
    "https://asr.api.speechmatics.com/v2/jobs",
    data={"config": json.dumps(conf).encode()},
    files=dict(config=None),
    headers={"Authorization": f"Bearer {API_KEY}"},
)

print(response.content)
job_id = json.loads(response.content)["id"]

job = requests.get(
    f"https://asr.api.speechmatics.com/v2/jobs/{job_id}",
    headers={"Authorization": f"Bearer {API_KEY}"},
)
status = json.loads(job.content)["job"]["status"]

while status == "running":
    time.sleep(10)
    job = requests.get(
        f"https://asr.api.speechmatics.com/v2/jobs/{job_id}",
        headers={"Authorization": f"Bearer {API_KEY}"},
    )
    status = json.loads(job.content)["job"]["status"]
    print(status)

transcript = requests.get(
    f"https://asr.api.speechmatics.com/v2/jobs/{job_id}/transcript",
    headers={"Authorization": f"Bearer {API_KEY}"},
)
print(transcript.content)

I sent an empty "files" dict to force the request into a multipart/form-data mime type (in case you were wondering why that was there, the server only accepts multipart/form-data). You can read more about that here

Hopefully, the SDK will get fixed soon, but for now this is the best type of approach available. Hope that helps!

P.S. there is already an open issue in github about this from February, but we've not had time to get round to it yet :(

UPDATE - 19 June 23

We finally got round to fixing and releasing this bug - huzzah! You should now be able to use fetch data with the python client as in the example you have given above, you just need to set audio=None. Here's an example using a wikimedia file:

from speechmatics.models import ConnectionSettings
from speechmatics.batch_client import BatchClient
from httpx import HTTPStatusError 

# Define transcription parameters
conf = {
  "type": "transcription",
  "transcription_config": {
    "language": "en",
    "diarization": "speaker"
  },
  "fetch_data": {
    "url": "https://upload.wikimedia.org/wikipedia/commons/8/83/%28eng%29-%28US%29-Man-of-war.wav"
  }
}

# Open the client using a context manager
with BatchClient() as client:
    try:
        job_id = client.submit_job(
            audio=None,
            transcription_config=conf,
        )
        print(f'job {job_id} submitted successfully, waiting for transcript')
        transcript = client.wait_for_completion(job_id, transcription_format='txt')
        print(transcript)
    except HTTPStatusError:
        print('Invalid API key - Check your API_KEY at the top of the code!')

It's worth noting that this example also makes use of a few other recent changes, which is why it has fewer configuration steps than the previous ones. The python client will now read auth and url config from a local toml file which can be set using the CLI command like speechmatics config set --{arg_name} {arg_value}. Config can also still be provided in the previous manner as well.

Speechmatics submit a job without audio argument

1 Answers1