How does sound data look like?

Question

I read how sounds represented with numbers in computer here.

And I figured out that usual representation is that, we get 44,100 numbers between [-32767, 32767] per second.

Then to my imagination, there's got to be a big one-column matrix, right?

I'm a R user, so speaking in R, sound data of 3 seconds would be,

s <- 3
sound <- matrix(0, ncol = 1, nrow = 44100 * s)
nrow(sound)
#> [1] 132300

one-column matrix with 132,300 rows.

Is this really the case?

I want some analogous picture in my head, say, in case of a picture with 256 * 256,

if we RGB that picture, we get 3 matrices each with 256 * 256.

And in the case of sounds, we get a long long column? As I think about this again, it's not even a matrix after all. It's a column.

Am I right? I can't find any similar dataset searching Internet.

Any advices will be welcomed. Thanks.

score 0 · Answer 1 · answered Oct 22 '19 at 04:37

The raw format that is created early in that linked question could look a lot like a single dimension array. And probably the signal that is sent to the speaker to make the sound could be represented similarly.

But you're unlikely to find a file on your computer that looks like that for several reasons:

Sound can be stored at different bit depth - that is how many bits for each 'number' CD Audio tracks have a 16 bit depth, but you could have 8 or 32 bits etc. In a straight stream of these numbers you need some how to know how far to read to the next number, so that information needs to be safed somewhere.
Sample rate can vary. If you've got a sequence of numbers representing an audio signal, then you need to know how long each number lasts for.
mostly sounds are more complex. Instead of a single source, you have stereo, or 5 channel, or whatever, so the system needs to be able to store / decode multiple pieces of information for the sounds you want to hear at a particular time
much of sound is repetitive, and so can often benefit from compression.

So most sounds are stored in a compressed format that includes wrapper information about how to decode it. The wrapper information includes how to decode the different audio channels, what sort of compression was used etc.

The closest you're likely to find are a .wav file (Windows) or .aiff (Mac). But even these include some metadata (sample rate and bit depth to start).

Thanks for the answer. But still, reason 1 can be addressed by controlling the number of rows. If 8 bit, then 2^8 rows, 32 bit, then 2^32 rows and so on. Reason 2 can still be addressed by enlarging the number of rows. Reason 3 can be addressed by having multiple arrays, may be extended to 2-column matrix if stereo or 5-column matrix if 5 channel. I want to see the numbers... So that I can do some analysis or transformation through those numbers. How people do those things if the data is encoded in `.wav` file?? — HyeonPhil Youn, Oct 22 '19 at 04:51

How does sound data look like?

1 Answers1