You have two slightly different problems you're trying to solve.
Radio streams such as the France-Inter stream you're talking about still need to be coordinated with Google directly or with one of the existing streaming services.
For your own audio files, SSML is certainly an issue, but you can handle what you want with the Media Response. Dialogflow will be called with an Event of actions_intent_MEDIA_STATUS
which you can create an Intent to capture. You can then send another Media Response with the next song in the playlist.
Update based on your comment.
To maintain your position in the playlist you should not use a global variable. Firebase cloud functions do not guarantee that you will get the same instance two calls in a row. There are a few good approaches which boil down to one of
- Using a Dialogflow Context to store the location
- Store the location in some data store (like a Firestore database) and index it using the session ID or
- Use the Actions on Google library
app.data
object