I'm assuming you mean one source of audio and one of speaking, from different URLs. If you don't know how to use Soap, you could grab both the audio and music streams using a 3rd party application like SAM broadcaster.
This will decode the streams and mix them like a conventional audio mixer before re-encoding and sending out to a single Icecast server as one stream.
Keep in mind, if you are doing voice overs, there will be latency to deal with. i.e. speaking will be heard by the final listener slightly after the part of audio you will be speaking to. This depends on the buffer lengths involved, and is because SAM broadcaster will be 'listening' to the audio at the same place you are (assuming you are speaking to the source audio stream). Then you need to add to that, the playing buffer SAM needs to process, playing your voice's stream to be mixed and passed on.