0

I have a large audio file that I would like to get transcribed. For this, I opted the silence-based conversion by splitting the audio file into chunks based on the silence between sentences. However, this takes longer than expected even for a short audio file.

from pydub import AudioSegment
from pydub.silence import split_on_silence
voice = AudioSegment.from_wav(path) #path to audio file
chunks = split_on_silence(voice, min_silence_len=500, silence_thresh=voice.dBFS-14, keep_silence=500,)

To try and process these chunks faster, I tried using a multi-threaded loop as shown

n_threads = len(chunks)
thread_list = []
for thr in range(n_threads):
    thread = Thread(target = threaded_process, args=(chunks[thr],))
    thread_list.append(thread)
    thread_list[thr].start()

for thread in thread_list:
    thread.join()

The function 'threaded_process' is supposed to perform the Speech-to-Text conversion

def threaded_process(chunks): 
    fh = open("recognized.txt", "w+") 
    i = 0
    for chunk in chunks: 
        chunk_silent = AudioSegment.silent(duration = 10)  
        audio_chunk = chunk_silent + chunk + chunk_silent 
        print("saving chunk{0}.wav".format(i)) 
        audio_chunk.export("./chunk{0}.wav".format(i), bitrate ='192k', format ="wav") 
        file = 'chunk'+str(i)+'.wav'
        print("Processing chunk "+str(i)) 
        rec = audio_to_text(file) #Another function which actually does the Speech to text conversion(IBM Watson SpeechToText API)
        if rec == "Error5487":
            return "Error5487E"
        fh.write(rec+" ")
        os.remove(file)
        i += 1
    fh.close()

But the conversion is done using the earlier method and not using multithreading. I also get this message- [WinError 32] The process cannot access the file because it is being used by another process: 'chunk0.wav' Why is this happening?

moonchild_
  • 17
  • 10
  • 2
    Not the problem, but shouldn't you be using multiprocessing rather than multithreading to increase speed since this is cpu bound rather than I/O bound? See [Multithreading vs Multiprocessing in Python](https://blog.usejournal.com/multithreading-vs-multiprocessing-in-python-c7dc88b50b5b) – DarrylG Nov 14 '20 at 10:44
  • As for multithreading: have a look at https://docs.python.org/3/library/multiprocessing.html. This might be more efficient than using threads in Python, as then you will have to deal with the global interpreter lock. – G. Sliepen Nov 14 '20 at 10:44
  • Example of audio transcribe using multiprocessing on AWS [How to speed up processing time of AWS Transcribe?](https://stackoverflow.com/questions/51929131/how-to-speed-up-processing-time-of-aws-transcribe/54422693#54422693) – DarrylG Nov 14 '20 at 10:53
  • I've read about the multiprocessing module but I'm unsure how to implement it in my case. – moonchild_ Nov 14 '20 at 11:50
  • @Hana--someone could help you convert your thread version to a multiprocessing version but you have to provide a [mimally reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) of your thread version. You neglected to mention 1) what modules need to be imported (i.e. pydub for AudioSegment I presume) and 2) how chunks array is created. – DarrylG Nov 14 '20 at 12:23
  • @DarrylG -Thanks for mentioning, I've edited the question accordingly. – moonchild_ Nov 14 '20 at 14:20
  • @Hana--what module provides audio_to_text? You should show the import for this function so we know the module. In a minimally reproducible example, someone should be able to reproduce what you're getting. – DarrylG Nov 14 '20 at 15:03
  • audio_to_text is implemented using IBM Watson Speech to Text Service and web socket interface – moonchild_ Nov 14 '20 at 15:10
  • @Hana--in that case the actual speech recognition is done in the cloud (not your local machine). audio_to_text is the function that does this in the cloud. This is a very important detail which should have been in your question. It means you are I/O limited not CPU limited so thread "are" the way to go after all. So, the problem becomes what's wrong with your thread. You don't show the import to get audio_to_text so others can't try improving your code. – DarrylG Nov 14 '20 at 15:25
  • @DarrylG-- So does that mean such a function within a thread would change its working? Also, do I have to include any more details? – moonchild_ Nov 14 '20 at 15:45
  • @Hana--"So does that mean such a function within a thread would change its working"--not sure of the question. I was asking for the function so others could reproduce what you're doing then try to fix it from there. – DarrylG Nov 14 '20 at 15:57

1 Answers1

1

In this case multithreading is faster since audio transcription is done in the cloud.

Uses

  • pydub (audio package)
  • speech_recognition (google speech recognition API for audio to text)

Code

import concurrent.futures      # thread execution manager
import os
from time import time

import wget                    # save url data to file

from pydub import AudioSegment # process speech
from pydub.playback import play
from pydub.silence import split_on_silence

import speech_recognition as sr # speech recognizer

#########################################################
# Related to Data Acquisition
#########################################################
def get_sound_file(url):
    ' Gets data from a url and places into file '
    local_file = wget.download(url) 
    
    return local_file      # name of file data is placed into

def get_nonexistant_path(fname_path):
    """ 
    Generates the next unused file name based upon the fname_path '

    Examples
    --------
    >>> get_nonexistant_path('/etc/issue')
    '/etc/issue-1'
    >>> get_nonexistant_path('whatever/1337bla.py')
    'whatever/1337bla.py'
    
    Source: https://stackoverflow.com/questions/17984809/how-do-i-create-a-incrementing-filename-in-python
    """
    if not os.path.exists(fname_path):
        return fname_path
    filename, file_extension = os.path.splitext(fname_path)
    i = 1
    new_fname = "{}-{}{}".format(filename, i, file_extension)
    while os.path.exists(new_fname):
        i += 1
        new_fname = "{}-{}{}".format(filename, i, file_extension)
    return new_fname

def create_files(source_file):
    ' Splits data into multiple files based upon silence'
    sound = AudioSegment.from_wav(source_file)
    
    # Break into segments based upon silence
    segments = split_on_silence(sound, silence_thresh = sound.dBFS - 14)
    
    # Store as separate files
    #https://stackoverflow.com/questions/33747728/how-can-i-get-the-same-bitrate-of-input-and-output-file-in-pydub
    # https://wiki.audacityteam.org/wiki/WAV
    original_bitrate = str((sound.frame_rate * sound.frame_width * 8 * sound.channels) / 1000)
    
    file_list = []
    for audio_chunk in segments:
        # File whose enumeration number has not been used yet
        # i.e. file-1.wav, file-2.wav, ...
        file_list.append(get_nonexistant_path(source_file))                        # Add a file name
        audio_chunk.export(file_list[-1], format ="wav", bitrate=original_bitrate)# use name of last file added
        
    return file_list  # list of files created


#########################################################
# Speech to text
#########################################################
def audio_to_text(filename):
    '''
        Converts speech to text
        based upon blog: https://www.geeksforgeeks.org/audio-processing-using-pydub-and-google-speechrecognition-api/
    '''
    # Get recognizer
    r = sr.Recognizer() 
    
    with sr.AudioFile(filename) as source: 
        audio_listened = r.listen(source) 

        # Try to recognize the listened audio 
        # And catch expections. 
        try:     
            return r.recognize_google(audio_listened) 
            

        # If google could not understand the audio 
        except sr.UnknownValueError: 
            print("Could not understand audio") 
            return None

        # If the results cannot be requested from Google. 
        # Probably an internet connection error. 
        except sr.RequestError as e: 
            print("Could not request results.") 
            return None
      
def process(file):
    '''
        Audio conversion of file to text file
    '''
    with open('result.txt', 'w') as fout:
        transcription = audio_to_text(file)
        if transcription:
            fout.write(transcription + '\n')
            
def process_single(files):
    '''
        Audio conversion multiple audio files into a text file
    '''
    with open('result-single.txt', 'w') as fout:
        for file in files:
            transcription = audio_to_text(file)
            if transcription:
                fout.write(transcription + '\n')
                
def process_threads(files):
    '''
        Audio conversion multiple audio files into a text file using multiple threads
    '''
    with open('result_thread.txt', 'w') as fout:
        # using max_workers = None means use default 
        # number threads which is 5*(number of cpu cores)
        with concurrent.futures.ThreadPoolExecutor(max_workers = None) as executor:
            for transcription in executor.map(audio_to_text, files):
                if transcription:
                    fout.write(transcription + '\n')
            

Test Code

if __name__ == "__main__":
    # url of data used for testing
    url = 'http://www.voiptroubleshooter.com/open_speech/american/OSR_us_000_0010_8k.wav'
    
    # download data to local file
    data_file_name = get_sound_file(url)
    
    # place data into chunks based upon silence
    chunk_file_names = create_files(data_file_name)

    # Process single file without partitioning into chunks
    t0 = time()
    process(data_file_name)
    print(f'Running entire audio file elapsed time: {time() - t0:.4f}')
    
    # Single threaded version
    t0 = time()
    process_single(chunk_file_names)
    print(f'Running chunked audio files elapsed time: {time() - t0:.4f}')
        
    # Multiple threaded version
    t0 = time()
    process_threads(chunk_file_names)
    print(f'Running chunked audio files using multiple threads elapsed time: {time() - t0:.4f}') 
            

Timing

Running entire audio file elapsed time: 13.0020
Running chunked audio files elapsed time: 17.8850
Running chunked audio files using multiple threads elapsed time: 3.6400
DarrylG
  • 16,732
  • 2
  • 17
  • 23
  • This does not answer the question, since it only applies multi-threading to the cloud api calls, not to the `split_on_silence` function. – serg06 Nov 30 '20 at 20:41
  • @serg06--OP used the split_on_silence function to separate the data into chunks. Then multi-threading was applied to the different chunks of data, not to the split_on_silence function. The idea is to get the speed up by processing the chunks in parallel through the cloud API calls, not by trying to speed up the split_on_silence function. – DarrylG Nov 30 '20 at 22:46
  • in-process thread function where you are opening the file, it will be `fout` instead of `four`. – Dark debo Jun 17 '22 at 14:22
  • 1
    @Darkdebo -- thanks for pointing out the typo. – DarrylG Jun 17 '22 at 14:29