I am creating a program to turn text into speech (TTS
).
What I've done so far is to split a given word into syllables and then play each pre-recorded syllables.
For example:
INPUT: [TELEVISION]
OUTPUT: [TEL - E - VI - SION]
And then the program plays each sound in order:
First: play TEL.wav
Second: play E.wav
Third: play VI.wav
Fourth: play SION.wav
I am using wave
and PyAudio
to play each wav file:
wf = wave.open("sounds/%s.wav" %(ss), 'rb')
p = pyaudio.PyAudio()
stream = p.open(...)
data = wf.readframes(CHUNK)
stream.write(data)
... etc.
Now the problem is that during the playback there is a delay between each audio file and the spoken word sounds unnatural.
Is it possible to mix these audio files without creating a new file and play them with 0.2s delay between each audio file?
Edit: I tried Nullman's solution and it worked better than just calling a new wf on each sound. I also tried putting a crossfade following these instructions.