secret sauce of below : pointA - pointB is zero if both points have same value ... that is numerically do a pointA minus pointB ... below leverages this to identify at what file byte index offset gives us this zero value when comparing the raw audio curves from a pair of input files ... or an close to zero in a relative sense if both source audio are different even slightly
approach is open up both files and pluck out the raw audio curve of each file ... define two variables bestSum and currentSum, set both to MAX_INT_VALUE ( any arbitrary high value ) ... iterate across the both files simultaneously and obtain the integer value of the current raw audio curve level of file A do same on other file B ... for each such integer just subtract the integer from file A from integer from file B ... continue this loop until you have reached end of one file ... inside of above loop add to currentSum variable the current value of the above mentioned subtraction ... at bottom of above loop update bestSum to become currentSum if currentSum < bestSum also store current file index offset ...
create an outer loop which does a repeat all of above by introducing an offset in time of one file then relaunch above inner loop ... your common audio is when you are using the offset which has the minimum total sum value .. that is the offset when you encountered bestSum
do not start coding until you have gained intuition that above makes perfect sense
I highly encourage you to plot out the curve of the raw audio for one file to confirm you are accessing this sequence of integers ... do this before attempting above algorithm
it will help to visualize above by viewing each input source audio as a curve and you simply keep one curve steady as you slide the other audio curve left or right until you see the curve shapes match or get very close to matching