Is it easier to play some sound when the processor is in 16-bit mode, or in 32-bit mode?
It makes little difference, mostly depend on the sound card you are trying to target - modern sound cards will require you to be able to address anywhere in the first 4GiB.
Putting the CPU into unreal mode would be the best - it is easy, gives you access to 4GiB address space and let you use the BIOS services.
Which format for the sound would be the best?
You can take a look at the MP3 page on Wikipedia to see that even before entering the technical details, the MP3 decoding is a very lengthy topic.
It presumes the knowledge of the Fourier series so you must be proficient in complex analysis (in the sense that involves complex numbers - the theory itself is a basic tool) and signal processing.
If you never heard of "sampling", "window function", "quantization" at all then it is time to read an handful of undergraduate books.
Of course you must be proficient with handling binary data, a picture or a table of the fields should be enough for you to write down the code.
An MP3 must be decoded into a series of samples.
Some card support decoding in hardware, I suppose (I never really checked).
The WAV file format is much easier - to the point that most file have the same fixed header (or you can support only that).
It is basically made to be played almost straightforwardly - if you settle on the sampling rate, bits per channels etc and produce wav files with only the minimal chunks you can even just seek into a specif offset and stream the data directly to the card.
The WAV format is already a sequence of samples if no compression is used.
Is there any public avaliable library for that? I haven't found anything useful.
Of course not, why there would be?
Some open source project to decode MP3 surely exists, it's not SO duty to point them out for you, but of course none will be targeted for the x86 boot environment.
You need to port it - and it may not be trivial.
Starting from finding a compiler that support real mode ending with implementing the minimal CRT needed.
Thing you also need to be proficient in:
- IRQ
- DMA
- PCI / PCI-E
- IO and MMIO
- Read from disk
Now that you know the titanic effort needed, here some advises:
- Have you considered the old speaker?
You can only generate square waves (it's either on or off, a 1-bit "audio card") pretty much like the Game Boy.
It may be good to warm up since it still require timing your code.
- You can program the SB16 relatively easy.
The SB16 is old, so it's simpler that modern cards.
DOSBox can emulate it.
I once wrote an answer on how to play a WAV file with such card under DOS.
You just need to port it.
- If you want to use a modern card find out the ones present in your system.
There may be two since most Intel chipsets ship with one.
The Intel datasheet for their chipsets always document the Intel Integrated HD Audio PCI device.
Other cards may have proprietary datasheets - finding them is 90% of the work.
- Test inside a VM
This will spare you from innumerable reboots.
I'm not trying to scary you, playing sound or in general multimedial content in assembly is something marvelous and you should not desist but beware that it is a non trivial amount of work that assumes proficiency in assembly.
If you were looking for a quick and dirt solution then it's better to desist.
It's up to you to judge your self and decide the best course of actions.