1

I've been observing an audio file under scipy.io.wavfile

which has a framerate of 44100 per sec or hz and total frames are 9745238 and the duration of the audio is 220 secs by the file properties but it should be 220.9804535147392 and has 2 channels.

after reading a file it returned me a 9745238 X 2 matrix of 16bit signed int as expected

where 1 column is channel 1 data and 2 column is the channel 2 data for respective 9745238 frames

so my question is there any robust method find these values (with 1, 2 channel consecutively i.e. each row or the matrix) per second or millisecond?

any guesses?

Edit 1

I've referred to a very intuitive discussion here

and i guess all i need is bitrate which is bitrate = sampleRate * bitDepth but how can i get bit depth is it sample size / sample width or something else.

P.hunter
  • 1,345
  • 2
  • 21
  • 45

1 Answers1

0

So, I wanted number of bits per second and after doing some research i found that i needed the rate of each bit i.e. bit rate , and certainly bit depth is the number of bits per sample (which is constant).

to understand this if we use the wave module to print the first frame of the file, we get something like this .

b'\x00\x00\x00\x00'

and as you can see that it is a 16bit-unicoded string and after converting it into 16bit signed int using numpy.

like np.fromstring(wav.readframes(1), np.int16)

it'll give you [0 0] that is 8 bits for each channel in the first frame.

and as the frameRate is 44100 , therefore bitrate = frame_rate (44100) * bitDepth (16) for each vector eg. [0 0], and we multiply this value by the number of channels (in my case 2) to get the answer with respect to the bits.

Edit 1

And Sample rate and frame rates are two different things, a frame is made up of number of samples, and sample_width is the size of each sample.

eg. consider an audio _ as a sample representation and has 3 channels, 4 frames, then it will be represented something like this.

[_ _ _] [_ _ _] [_ _ _] [_ _ _]

if you view the matrix using the scipy library then instead of _ there will be numbers whose value will be according to decoded string and to put the statement in another way we can say that each element in the matrix is a sample.

so there are total 12 samples in the audio and if we suppose duration of the song to be, 1 sec then the frame_rate will be 4hz and sample_rate will be 12hz.

for more information you can refer to the answers in these discussions.

  1. sound.stackexchange
  2. theDontOvelookCommentsSection
  3. this one too
P.hunter
  • 1,345
  • 2
  • 21
  • 45