1

I want to make a simple .wav player in C# for learning purposes. I want to get more insight into how audio is stored and played on the computer so I want to play a .wav manually rather than with the simple call of a built in function.

I've looked at the structure of .wav files and found some great resources. What I've found is the wav file format stores the data of the sound starting from the 44th byte. It contains data about channels and sample rates in previous bytes but that is not relevant to my question.

I found that this data is a soundwave. As far as I know the height of a sample of a soundwave represents it's frequency. But I don't get where the timbre comes from? If I only played sounds for the correct amount of time on a correct frequency I would get beeps. I could play them simply with System.Console.Beep(freq, duration); but you could hardly call that music.

I have tried looking through multiple resources but they only described the meta data and didn't cover what is exactly in the sound byte stream. I found a similar question and answer on this site but it doesn't really answer that question, it is not even marked accepted because of that I believe.

What is the data exactly in the wave byte stream and how can you make that into an actual played sound on the computer?

66Gramms
  • 769
  • 7
  • 23

4 Answers4

2

You are mistaken: The height of a sample does not represent a frequency. As a matter of fact, the wav-format doesn't use frequencies at all. wav basically works following way:

  • An analog signal is sampled at a specific frequency. A common frequency for wav is 44,100 Hz, so 44,100 samples will be created each second.
  • Each sample contains the height which the analog signal has at the sample time. A common wav-format is the 16 bit format. Here, 16 bit will be used to store the height of the signal.
  • This all occurs separately for each channel.

I'm not sure in which order the data is stored, but maybe some of the great resources you found will help you with that.

Xaver
  • 1,035
  • 9
  • 17
  • And so what does the height of the signal (that 16 bit, one sample) represent exactly? How would you make that 16 bit into a unique sound? – 66Gramms Nov 27 '21 at 20:49
1

Adding to the above answer the height of the sample is the volume when played back. It represents how far back or forward the speaker is pulled or pushed to re-create the vibration.

The timbre you refer to is determined by the frequency of the audio wave.

There is a a lot going on in audio, a simple drumbeat will produce sound on several frequencies including harmonics or repeated vibrations at different frequencies, but all of this is off topic for a programming site, so you will need to research sound and frequencies and perhaps DSP.

What you need to know from a computers perspective is that sound is stored as samples taken at set frequencies, as long as we sample at twice the frequency of the sound we wish to capture we will be able to re product the original. The samples record the current level (volume) of the audio at that moment in time, turning the samples back into audio is the Job of the Digital to Analogue Converter found on your sound card.

The operating system looks after passing the samples to the hardware via the appropriate driver. In windows WASAPI and ASIO are two API’s you can use to pass the audio to the sound card. Look at open source projects like NAudio to see the code required to call these operating system APIs.

I hope this helps I suspect the topic is broader than you first imagined.

Max Healey
  • 84
  • 5
1

For anyone who wants more clarification, the value of each byte in the data section of the .wav file represents the amplitude (volume) of each sample. Samples are played at a certain frequency defined at near the start of the .wav file. A common frequency is 44100 hertz, which is used for CD-quality audio. 44100 hertz would mean that the samples are played every 1/44100 seconds.

If the samples rise linearly, then the sound produced would get louder as the file played. A regular musical sounds can be generated by using cyclical changes in amplitude, such as sine and saw waves.

Complex sounds such as drums, which produce multiple frequencies, are created by adding each of the sine and cosine waves produced into a single wave, called Fourier series. As an earlier answer said, this process is repeated for each channel, which is normally used to create surround sound.

1

Some physics and math of sound principles. Hope also helps.

Sound, as we human being perceive, is simply a vibration of energy in air. The higher frequency of the vibration, the higher pitch we perceive. When a musical key (say middle C of a piano) is hit, it creates an energy of vibration in air up and down about 260 times a second. The higher the amplitude of this vibration, the louder we perceive.

The difference between a same pitch played with a piano and a violin is, besides the main vibration wave, there are many small harmonic waves embedded in it. The different combinations of the harmonic waves make the different sounds of an identical pitch (from a piano, or violin, or trumpet...).

Back to the main story, as other people described, .wav format only stores the amplitude of energy level in each sample, from which you can plot an amplitudes versus time graph first. This is also known as the time domain of a waveform. You can then use a mathematical principle called 'Fourier Transform' to convert the time domain into the frequency domain, which tells you the information about what frequencies are detected and relatively how often they are detected in this waveform.

For example, if you want to synthesize the sound of violin played in C, first you need to analyse the distribution of it's harmonic waves, then construct the frequency domain consisting of the main 260 Hz plus a series of sub-frequencies from it's harmonic waves. Use Inverse Fourier Transform to convert the frequency domain back to the time domain. Store the data in .wav format.

Ramon Chan
  • 21
  • 2