Unfortunaly, I don't think the speechmatics python client currently supports using the fetch_data feature. I'm a senior software engineer at Speechmatics, and this is a known problem that we're looking into.
It is possible to send the fetch_data to the server with an empty audio file, but it gets rejected with a 400 error as it can't accept both inputs at once, so for now there is no solution that uses the SDK.
However, the SDK is really just a thin wrapper around the RESTful API. It is possible to write a simple python script that uses the requests module to achieve the same thing. I wrote the below script and tested it against a wikimedia audio file and it worked okay.
It just sends a basic http post request, then uses the job_id to poll for the job status until the status is finished running. Then it gets the transcript (which will default to json format) and prints it out (as a raw string, not json - but can be converted to json with json.loads()). Here's the code:
import requests
import json
import time
API_KEY = "YOUR_API_KEY"
LANGUAGE = "en"
AUDIO_URL = "YOUR_URL"
conf = {
"type": "transcription",
"transcription_config": {"language": LANGUAGE, "diarization": "speaker"},
"fetch_data": {"url": AUDIO_URL},
}
response = requests.post(
"https://asr.api.speechmatics.com/v2/jobs",
data={"config": json.dumps(conf).encode()},
files=dict(config=None),
headers={"Authorization": f"Bearer {API_KEY}"},
)
print(response.content)
job_id = json.loads(response.content)["id"]
job = requests.get(
f"https://asr.api.speechmatics.com/v2/jobs/{job_id}",
headers={"Authorization": f"Bearer {API_KEY}"},
)
status = json.loads(job.content)["job"]["status"]
while status == "running":
time.sleep(10)
job = requests.get(
f"https://asr.api.speechmatics.com/v2/jobs/{job_id}",
headers={"Authorization": f"Bearer {API_KEY}"},
)
status = json.loads(job.content)["job"]["status"]
print(status)
transcript = requests.get(
f"https://asr.api.speechmatics.com/v2/jobs/{job_id}/transcript",
headers={"Authorization": f"Bearer {API_KEY}"},
)
print(transcript.content)
I sent an empty "files" dict to force the request into a multipart/form-data mime type (in case you were wondering why that was there, the server only accepts multipart/form-data). You can read more about that here
Hopefully, the SDK will get fixed soon, but for now this is the best type of approach available. Hope that helps!
P.S. there is already an open issue in github about this from February, but we've not had time to get round to it yet :(
UPDATE - 19 June 23
We finally got round to fixing and releasing this bug - huzzah! You should now be able to use fetch data with the python client as in the example you have given above, you just need to set audio=None
. Here's an example using a wikimedia file:
from speechmatics.models import ConnectionSettings
from speechmatics.batch_client import BatchClient
from httpx import HTTPStatusError
# Define transcription parameters
conf = {
"type": "transcription",
"transcription_config": {
"language": "en",
"diarization": "speaker"
},
"fetch_data": {
"url": "https://upload.wikimedia.org/wikipedia/commons/8/83/%28eng%29-%28US%29-Man-of-war.wav"
}
}
# Open the client using a context manager
with BatchClient() as client:
try:
job_id = client.submit_job(
audio=None,
transcription_config=conf,
)
print(f'job {job_id} submitted successfully, waiting for transcript')
transcript = client.wait_for_completion(job_id, transcription_format='txt')
print(transcript)
except HTTPStatusError:
print('Invalid API key - Check your API_KEY at the top of the code!')
It's worth noting that this example also makes use of a few other recent changes, which is why it has fewer configuration steps than the previous ones. The python client will now read auth and url config from a local toml file which can be set using the CLI command like speechmatics config set --{arg_name} {arg_value}
. Config can also still be provided in the previous manner as well.