1

Heads up, this is my first real programming project, but I'm really dedicated to making it work and would love some input.

I've written a program to attempt to record sound using Pyaudio and then halt the recording once the sound intensity has dropped under a certain threshold. To do this, I took the audio data, turned it into integer data, averaged the chunks of data collected by the program, and then set the program to halt after it dropped below a threshold that I picked after some trial and error. The issue is that, the averages of the data clusters don't seem to actually correlate to the intensity of the audio input, and it sometimes drops below the threshold and halts recording even when there is significant input (eg, constant music playing). Below is the code:

import pyaudio
import struct
import wave

def record(outputFile):
#defining audio variables
    chunk = 1024
    FORMAT = pyaudio.paInt16
    CHANNELS = 1
    RATE = 44100
    Y = 100

#Calling pyadio module and starting recording 
    p = pyaudio.PyAudio()

    stream = p.open(format=FORMAT,
                channels=CHANNELS, 
                rate=RATE, 
                input=True,
                frames_per_buffer=chunk)

    stream.start_stream()
    print("Starting!")




#Recording data until under threshold
    frames=[]

    while True:
        #Converting chunk data into integers
        data = stream.read(chunk)
        data_int = struct.unpack(str(2*chunk) +'B', data)
        #Finding average intensity per chunk
        avg_data=sum(data_int)/len(data_int)
        print(str(avg_data))
        #Recording chunk data
        frames.append(data)
        if avg_data < Y:
            break

#Stopping recording   
    stream.stop_stream()
    stream.close()
    p.terminate()
    print("Ending recording!")


#Saving file with wave module    
    wf = wave.open(outputFile, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()

record('outputFile1.wav')
Twanski94
  • 11
  • 2
  • when I print out data_int the integers are not the raw audio curve ... raw audio is simply a series of numbers ( could be either integer or floating point ) which represent the current height of the curve on Y axis ... you need to focus on getting your data into this raw audio ... keep in mind the notion of bit depth ... typical CD quality audio has a bit depth of 16 bits which gets stored as two bytes in the underlying data structure – Scott Stensland Jun 13 '20 at 12:37
  • Hi @Scott Stensland. Thanks for the reply. When I print the chunks of data, I get numbers between 0 and 255. Are you not getting this? – Twanski94 Jun 13 '20 at 12:43
  • yes and that is evidence of the problem ... you define `FORMAT = pyaudio.paInt16` which is fine so you want a bit depth of 16 ... a 16 bit integer varies from 0 to 65535 not from 0 to 255 ... an 8 bit integer varies from 0 to 255 ... so data_int is a buffer of bytes not 16 bit integers ... you need to change the struct.unpack – Scott Stensland Jun 13 '20 at 13:02
  • Extremely helpful--I'll look into it. Thanks! – Twanski94 Jun 13 '20 at 13:52
  • Cannot figure out how to get the data into 0-65535 integers. Tried to use int.from_bytes() built-in command, but it seems to sum all of the byte data. I tried to use various data types in the struct.unpack function. I'm lost! – Twanski94 Jun 13 '20 at 15:43
  • look at https://stackoverflow.com/questions/4160175/detect-tap-with-pyaudio-from-live-mic also https://stackoverflow.com/questions/54865473/understanding-the-output-from-the-fast-fourier-transform-method and https://stackoverflow.com/questions/59240015/pyaudio-how-do-i-find-the-volume-of-a-certain-range-of-sound-frequencies – Scott Stensland Jun 14 '20 at 03:34

0 Answers0