How to add audio from firebase storage in actions on Google?

Question

To be very clear I want to know how to add audio from firebase storage in actions on Google? I've been stuck in this question since few weeks. I've uploaded my audio in the firebase storage and from it I have copied the link provided by Firebase and and pasted the audio's URL in the given format in speech output. Check how I did:-

<speak>
<audio src="https://firebasestorage.googleapis.com/v0/b/enrich-58fdf.appspot.com/o/xxx.mp3?alt=media&token=aabcd430-9d46-45f6-ad21-fdca0895123f">
</audio>
</speak>

But this didn't work. But few months ago in the Google+ community, I found a guy who asked the question similar to this and Allen Firstenberg replied and he said to add *amp;*token between & and token. So after this, a new code generates i.e.

<speak>
<audio src="https://firebasestorage.googleapis.com/v0/b/enrich-58fdf.appspot.com/o/xxx.mp3?alt=media&amp;token=aabcd430-9d46-45f6-ad21-fdca0895123f">
</audio>
</speak>

But this also didn't work. I think after some tweaks in SSML, this code might have changed or the format is different, which I am not aware of. So can anyone help me out ?

If you go the audio URL in a un-authenticated browser window does it play automatically, or does it ask you to log in? — Conor Livingston, Jan 02 '18 at 14:21
Is there a reason why you are using Firebase Storage? For my audio, I use Firebase Hosting For my voice app Daily Affirmation. I found it to be very easy to setup. It goes something like: firebase init, — SysCoder, Jan 02 '18 at 17:50
Can you give an example URL that is valid? Trying your first URL doesn't work, so I assume "xxx.mp3" was just an example. — Prisoner, Jan 02 '18 at 21:22
@Prisoner Yes, xxx.mp3 was just an example. The valid URL is:- gs://enrich-58fdf.appspot.com/welcome.mp3 — Raghav Joshi, Jan 03 '18 at 14:24
@SysCoder Yes, I know about firebase hosting, but I want to do it without any fulfillment. — Raghav Joshi, Jan 03 '18 at 14:25
Firebase Hosting wouldn't require any fulfillment. It would make a public URL for the audio available (without any of the token stuff - it just acts like a web server). — Prisoner, Jan 03 '18 at 14:46
What happens when you say it "didn't work"? What error are you getting in the simulator? What is the *exact* SSML you're using, and exactly where are you using it? — Prisoner, Jan 03 '18 at 14:50
@SysCoder - Happy to discuss this further in an appropriate forum, but Hosting isn't desirable when you have dynamic content. There is no API to save content into Hosting, but there is to save content into Storage. So if you're generating audio to be used in an Action, you want to use something more like Storage. If you have static assets, Hosting does make more sense. (And is a lot easier to use!) — Prisoner, Jan 03 '18 at 14:59

score 2 · Answer 1 · answered Jan 02 '18 at 16:51

2

Try escaping the & by replacing& with &amp.

If you're still having trouble try adding some text between the </audio> tag and the </speak> tag. Actions on Google requires that both display text and SSML or SSML to be able to be visually represented. Adding text in the SSML will allow your SSML to be rendered visually as well as audibly.

Below is a fulling working SSML string using Firebase storage using both the above mentioned techniques:

<speak>
<audio src="https://firebasestorage.googleapis.com/v0/b/repeater-96d05.appspot.com/o/digital_watch_alarm_long.ogg?alt=media&amp;token=cdf4d1da-1d1f-42eb-a478-3912275d0f37">
</audio>
text
</speak>

answered Jan 02 '18 at 16:51

mattcarrollcode

3,429
16
16

@matthewanye Do you mean to put the "text" inside of the audio tags? If you have them outside, after the audio is played it will then speak out the text. If they are inside the audio tags, they will not be spoken unless the audio file cannot be retrieved. – SysCoder Jan 03 '18 at 00:26
Everything you said is correct. What I was trying to communicate is that if you don't specify the `display_text` attribute elsewhere in your response in addition to the SSML we've discussed Actions on Google requires your SSML to be rendered visually which means that some text to be spoken will need to be present in your SSML response if you don't define `display_text` else where. In the example above, the easiest way to do this is to add some text inside the `` tags and outside of the ` – mattcarrollcode Jan 03 '18 at 04:03
@matthewayne That's what I did. And about the text, I was able to use the audio without any text. – Raghav Joshi Jan 03 '18 at 14:28

score 2 · Answer 2 · answered Jan 03 '18 at 14:56

2

As both you and @matthewayne noted (and as I noted in a different answer you reference), you need to escape the & to use proper XML formatting, so it needs to be &

However - I don't see a problem. I used this exact code in the Dialogflow "Text Response" area, and it works without problems:

<speak><audio src="https://firebasestorage.googleapis.com/v0/b/enrich-58fdf.appspot.com/o/welcome.mp3?alt=media&amp;token=aabcd430-9d46-45f6-ad21-fdca0895123f"></audio></speak>

I've also tested this as part of a Simple Response setting through Dialogflow, and it works fine.

answered Jan 03 '18 at 14:56

Prisoner

49,922
7
53
105

For that **exact** URL and layout? You cut and pasted it? – Prisoner Jan 18 '18 at 11:37
I just copied the code ( the code you gave ) and pasted it to text response. But it didn't work – Raghav Joshi Jan 18 '18 at 14:47
I just re-tested, and its definitely still working. What is it doing/not doing for you? What shows up in the console of the simulator? What devices are you testing it with? – Prisoner Jan 18 '18 at 15:08

score 1 · Answer 3 · answered Jan 02 '18 at 15:20

FWIW: Are you sure that your audio files have been recorded with an acceptable format? I had to use a tool called Audacity to convert some audio clips I had to one of the formats acceptable to the AoG platform:

Format: MP3 (MPEG v2) 24K samples per second 24K ~ 96K bits per second, fixed rate

Format: Opus in Ogg 24K samples per second (super-wideband) 24K - 96K bits per second, fixed rate

Format (deprecated): WAV (RIFF) PCM 16-bit signed, little endian 24K samples per second

For all formats: Single channel is preferred, but stereo is acceptable. 120 seconds maximum duration. 5 megabyte file size limit. Source URL must use HTTPS protocol.

Yes I think, my audio meets every requirements listed above – Raghav Joshi Jan 03 '18 at 14:29 — Raghav Joshi, Jan 03 '18 at 14:29

How to add audio from firebase storage in actions on Google?

3 Answers3

Linked