I read how sounds represented with numbers in computer here.
And I figured out that usual representation is that, we get 44,100 numbers between [-32767, 32767] per second.
Then to my imagination, there's got to be a big one-column matrix, right?
I'm a R user, so speaking in R, sound data of 3 seconds would be,
s <- 3
sound <- matrix(0, ncol = 1, nrow = 44100 * s)
nrow(sound)
#> [1] 132300
one-column matrix with 132,300 rows.
Is this really the case?
I want some analogous picture in my head, say, in case of a picture with 256 * 256,
if we RGB that picture, we get 3 matrices each with 256 * 256.
And in the case of sounds, we get a long long column? As I think about this again, it's not even a matrix after all. It's a column.
Am I right? I can't find any similar dataset searching Internet.
Any advices will be welcomed. Thanks.