I would start the recording thread first and look for the first peak in the signal captured by the mic. This will tell you how many ms after recording started the first sound was detected. For this you probably need to know the sampling rate of the mic etc- here is a good starting point.
The timeline is something like this
---- recording start ------- playback start -------- sound first detected ----
You want to find out how many ms after you start recording a sound was picked up ((first_peak - recording_start)
in the code below), and then subtract the time it took to start the playback ((playback_start - recording_start)
below)
Here's a rough code outline
from datetime import datetime
recording_start, playback_start, first_peak = None, None, None
def play_sound():
nonlocal playback_start
playback_start = datetime.now()
def record():
nonlocal recording_start, first_peak
recording_start = datetime.now()
first_peak = find_peak_location_in_ms() # implement this
Thread(target=record()).start() # note recording starts first
Thread(target=play_sound()).start()
# once the threads are finished
delay = (first_peak - recording_start) - (playback_start - recording_start)
PS one of the other answers correctly points out that you need to worry about the global interpreter lock. You can likely bypass it by using c-level APIs to record/play the sound without blocking other threads, but you may find Python's not the right tool for that job