I have an audio file I want to split into multiple files. The files are structured into pairs of sound separated by silence. The timeline looks like this with - to represent silence:
-----Sound A1-----Sound A2-----Sound B1-----Sound B2-----
I want to find the boundary between Sound A2 and Sound B1. I want a solution preferably using a combination of Python, OpenCV, and FFmpeg, but any tools that work will do.