Rob think of it like this:
PHP is a cake; when you run the script on the server it mixes the ingredients and cooks the cake.
But when the browser displays the page; it's only eating the cake. You can not eat the first half of the cake while the other half is still cooking, PHP does not naively provide data in "chunks"; it's all or nothing from the server to the browser. It is an all-or-nothing delivery of cake.
Your efforts so far look like you're trying to "chunk" the data/cake and that simply won't work in PHP unless you employ possibly some needlessly complex system of output_buffering
.
So; send EVERYTHING to the browser and use Javascript (jQuery) to tell the browser to use one DOM element before another.
Recommendations: Tell PHP simply to load both elements into the browser and use JQuery to then display one (the image) after the other (the sound) has played, using CSS transitions and JQuery event drivers.