0

Each WAV files depends on a Sampling Rate and a Bit Depth. The former governs how many different samples are played per second, and the latter governs how many possibilities there are for each timeslot.

For sampling rate for example 1000 Hz and the bit depth is 8 then each 1/1000 of a second the audio device plays one of a possible $2^8$ different sounds.

Hence the bulk of the WAV file is a sequence of 8-bit numbers. There is also a header which contains the Sampling Rate and Bit Depth and other specifics of how the data should be read:

enter image description here

The above comes from running xxd on a wav file to view it in binary on the terminal. The first column is just increments of 6 in hexadecimal. The last one seems to say where the header ends. So the data looks like this:

enter image description here

Each of those 8-bit numbers is a sample. So the device reads left-to right and converts the samples in order into sounds. But how in principle can each number correspond to a sound. I would think each bit should somehow encode an amplitude and a pitch, with each coming from a finite range. But I can not find any reference to for example the first half of the bits being a pitch and the second being a frequency.

I have found references to the numbers encoding "signal strength" but I do not know what this means.Can anyone explain in principle how the data is read and converted to audio?

Daron
  • 101
  • 2
  • fdcpp gave a good link with a proper explanation. Long story short, a single byte encodes (quantised) amplitude of a wave. As a single number, it has no physical interpretation. A bunch of points form (part of) a wave, and for such a collection you can compute audio properties such as a pitch. You can't have a pitch for a single point as it is impossible to tell how waveform could look like. – Lukasz Tracewski Feb 08 '22 at 05:11
  • @fdcpp I have seen those waveform diagrams before. I already know the data is a series of numbers. And the diagram plots the numbers as heights from left to right. But that doesn't give any hints as to how a sequence of numbers -- suitably interpreted of course -- provides a sound. – Daron Feb 08 '22 at 10:49
  • @LukaszTracewski Okay but a wave is defined by frequency as well as amplitude. How can we use a sequence of amplitudes to recover the entire wave? – Daron Feb 08 '22 at 10:51
  • What you might be reaching for is an explanation of [nyquist Shannon](https://en.m.wikipedia.org/wiki/Nyquist-Shannon_sampling_theorem) – fdcpp Feb 08 '22 at 10:55
  • I’d recommend this question be pointed at someone from your institute’s physics, acoustics or music department – fdcpp Feb 08 '22 at 10:56
  • Draw a wave, say a sine wave, for yourself. Did you have to associate frequency with every "point"? No, that would not even have interpretation. If I draw the sine for you, and won't tell you what frequency I have used, you will be able to calculate it yourself (analytically or "brute force" with FFT). The same applies for audio. Try it fot yourself with code like e.g. here: https://stackoverflow.com/questions/8299303/generating-sine-wave-sound-in-python (answer with the highest score). – Lukasz Tracewski Feb 09 '22 at 15:15
  • @LukaszTracewski The frequency of the wave is just "things per second". What exactly are the "things"? If the wave was given by a spring then the "things" would be extensions and retractions. What is the analogous object for a wav file? – Daron Feb 09 '22 at 15:54
  • 1
    @fdcpp Five minutes after posting I flagged my own question as maybe being more appropriate for the Signal Processing SE. – Daron Feb 09 '22 at 17:07

1 Answers1

0

In your example, over the course of a second, 1000 values are sent to a DAC (Digital to Analog converter) where the discrete values are smoothed out into a waveform. The pitch is determined by the rate and pattern by which the stream of values (which get smoothed out to a wave) rise and fall.

Steve W. Smith gives some good diagrams and explanations in his chapter ADC and DCA from his very helpful book The Scientists and Engineers Guide to Digital Signal Processing.

Phil Freihofner
  • 7,645
  • 1
  • 20
  • 41
  • "The pitch is determined by the rate and pattern by which the stream of values (which get smoothed out to a wave) rise and fall." Can you go into more detail about how this works please? – Daron Feb 08 '22 at 10:47
  • Yes the values determine the pitch (and amplitude, and everything else about the sound) but how exactly do they do this? – Daron Feb 08 '22 at 10:51
  • @Daron That isn't a question about `.wav`s, that is more going into high-school physics and general propagation of waves. I don't think Stack Overflow is the best platform for this question. – fdcpp Feb 08 '22 at 15:30
  • IDK how to answer your question. Our eardrums respond to air pressure differences over time. If the pressure varies back and forth 440 times in a second, we hear the A that orchestras use to tune. The digitized data sketches out this pattern with values, rising and falling over the course of time. If signed PCM values evenly go above and below zero 440 times over the course of 1000 samples (using your sample rate of 1000 samples per second), the various stages (e.g., going through DAC) will smooth out the data and provide a matching continuous signal at the speaker. – Phil Freihofner Feb 08 '22 at 22:49
  • @PhilFreihofner So it it correct to say that the y-axis in the image [here](https://stackoverflow.com/questions/13039846/what-do-the-bytes-in-a-wav-file-represent) is the air pressure? – Daron Feb 09 '22 at 12:46
  • @PhilFreihofner And in your example the speaker will vibrate at a rate of 440 times per second -- and this results in air pressure fluctuations of the same frequency near the eardrum? – Daron Feb 09 '22 at 12:47
  • Yes, that's the gist of it. The Y-axis basically represents the amount by which (eventually) the speaker cone will push in or out over time, creating air pressure fluctuations that radiate out and eventually reach the eardrum, which is built to respond to these minute and rapid air pressure fluctuations. – Phil Freihofner Feb 09 '22 at 19:01