SpeakSsmlAsync returns BadRequest

Question

When calling SpeakSsmlAsync (Microsoft Speech SDK), the following error message is returned:

> CANCELED: Reason=Error
> CANCELED: ErrorCode=BadRequest 
> CANCELED: ErrorDetails=[HTTPAPI result code = HTTPAPI_OK. HTTP status code=400.] 
> CANCELED: Did you update the subscription info?

Steps to reproduce:

Download Quickstart sample from https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/quickstart/text-to-speech/csharp-dotnet-windows
Replace Subscription ID and region with own values, set active configuration as described in documentation, clean and rebuild project
Start program and enter some text like "abracadabra"

--> Works fine (uses SpeakTextAsync)
Replace SpeakTextAsync with SpeakSsmlAsync
Start programm and enter some text

--> ErrorCode=BadRequest
Retry with proper SSML code like <speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">abracadabra</speak>"

--> ErrorCode=BadRequest

System

.NET Framework 4.6.1
Windows 10 Build 17134
Service Region = "westeurope"

Code

using System;
using System.Threading.Tasks;
using Microsoft.CognitiveServices.Speech;

namespace helloworld
{
    class Program
    {

        private static string endpointSpeechKey = "<MyOwnServiceKey>";
        private static string region = "westeurope";

        public static async Task SynthesisToSpeakerAsync()
        {
            var config = SpeechConfig.FromSubscription(endpointSpeechKey, region);
            using (var synthesizer = new SpeechSynthesizer(config))
            {
                Console.WriteLine("Type some text that you want to speak...");
                Console.Write("> ");
                string text = Console.ReadLine();

                using (var result = await synthesizer.SpeakSsmlAsync(text))
                {
                    if (result.Reason == ResultReason.SynthesizingAudioCompleted)
                    {
                        Console.WriteLine($"Speech synthesized to speaker for text [{text}]");
                    }
                    else if (result.Reason == ResultReason.Canceled)
                    {
                        var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
                        Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");

                        if (cancellation.Reason == CancellationReason.Error)
                        {
                            Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                            Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
                            Console.WriteLine($"CANCELED: Did you update the subscription info?");
                        }
                    }
                }

                // This is to give some time for the speaker to finish playing back the audio
                Console.WriteLine("Press any key to exit...");
                Console.ReadKey();
            }
        }

        static void Main()
        {
            SynthesisToSpeakerAsync().Wait();
        }
    }
}

Debug Screenshot

score 3 · Answer 1 · answered Jun 04 '19 at 09:57

3

Azure seems to accept SSML only when a voice-tag is included. Otherwise you'll get the http-400-error.

With the code below the call to SpeakSsmlAsync works successfully:

text = @"<speak version='1.0' xmlns='https://www.w3.org/2001/10/synthesis' xml:lang='en-US'><voice name='en-US-ZiraRUS'>abracadabra</voice></speak>";
using (var result = await synthesizer.SpeakSsmlAsync(text))

Watch out when searching for Microsoft SSML. There is a difference between

https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup

(which is what you want when programming against Azure Speech services) and

https://learn.microsoft.com/en-us/cortana/skills/speech-synthesis-markup-language

answered Jun 04 '19 at 09:57

Frank im Wald

896
1
11
28

My understanding is Voice is optional. So the question might be why are you getting a 400 error? Could it be that Speech Service is looking for a default voice that isn't installed? I would tag this as a defect on the speech service documentation and/or use feedback to report. – Micromuncher Jun 04 '19 at 15:53
@Micromuncher I just got an answer on the MSDN forum confirming the behaviour. They'll check if they can fix it https://social.msdn.microsoft.com/Forums/en-US/efa7777e-3709-4537-8c84-59e865b0f65e/speech-sdk-speakssmlasync-returns-badrequest#234dc795-b1ab-41ce-80e2-4b102ba99b82 – Frank im Wald Jun 05 '19 at 07:34
Can you actually get hold of the audio file that is returned and automatically played back? As every new request fetches a new audio file from the server (even if the text didn’t change) it would be convenient to stash away the audio file for later use and to play it back from that ‘cache’. Anybody? – mramosch Apr 13 '21 at 22:16
@mramosch - yes, sure, that's what you will normally do. If it doesn't work for you, maybe posting a corresponding question would be the right way to go. – Frank im Wald Apr 14 '21 at 15:03
I thought I’d better put the question here where people participate that seem to have the required knowledge to help me out. Usually when I post some new issue, I get no response at all... – mramosch Apr 15 '21 at 12:39
@mramosch Got it, but I dare to say that you won't get any answer this way either. Just post your question, make sure to add all relevant information (especially, what exactly the actual question is), add appropriate tags and there is a chance that people will find it. – Frank im Wald Apr 15 '21 at 17:31

score 0 · Answer 2 · answered Dec 06 '19 at 23:40

0

yes, Azure TTS service only accept SSML with voice tags.

the reason is that there are so many voice, so it is better to explicitly specify which voice to use.

answered Dec 06 '19 at 23:40

Sheng

21
1

SpeakSsmlAsync returns BadRequest

2 Answers2