0

This is my first with python and I don't understand the processes:

I am using Pocketsphinx for speech to text and have a problem with the function createAudio(videoclip):. The script doesn't run the whole transcript() function or rather runs only for some seconds until the next function starts. How can I make it that the transcript() function runs until it is finished and the next one can start/follow, without setting a timer? Because the video file will have bigger and different sizes later.

For testing: The .wav file is 300MB big and the text-output is only 4 words

def createAudio(videoclip):

pathOfSameFolder=str(pathOfFolder)
dirCreated = True

try:
    newDir=createDirectory(pathOfSameFolder)
except OSError:

    print ('Error: Creating directory. ' +  str(pathOfSameFolder) + ' may be existing!')
    dirCreated = False 

if dirCreated :
    audioclip = videoclip.audio
    audioclip.write_audiofile(mp3_file)
    audioclip.close()
    videoclip.close()
    print('Converting audio transcripts into text ...')
    transcript()
    directoryMove(newDir)

This is the whole code:

import moviepy.editor as mp
from moviepy.editor import *
import speech_recognition as sr
import shutil
from random import random
import threading
import time
import asyncio
import os
from pocketsphinx import AudioFile, get_model_path, get_data_path
from sphinxbase.sphinxbase import *

mp4_file = r'/Users/younesyaakoubi/Desktop/5min.mp4'
mp3_file = r'/Users/younesyaakoubi/Desktop/audio_only.wav'

newMethodmp3_file = r'/Users/younesyaakoubi/Desktop/AUDIO_FILE/audio_only.wav'


model_path = get_model_path()
data_path = get_data_path()

path = os.getcwd()

config = {
    'verbose': False,
    'audio_file': os.path.join(data_path, str(mp3_file)),
    'buffer_size': 2048,
    'no_search': False,
    'full_utt': False,
    # 'hmm': os.path.join(model_path, 'en-us'),
    # 'lm': os.path.join(model_path, 'en-us.lm.bin'),
    # 'dict': os.path.join(model_path, 'cmudict-en-us.dict')
}

r = sr.Recognizer()

pathOfFolder= "/Users/younesyaakoubi/Desktop/AUDIO_FILE"
audioFileName= "audio_only.wav"
scriptName="script.txt"

#Save Videofile into object to be handled by next function
def convert():
    videoclip = VideoFileClip(mp4_file)
    createAudio(videoclip)



 #Convert video to audio 
def createAudio(videoclip):

    pathOfSameFolder=str(pathOfFolder)
    dirCreated = True

    try:
        newDir=createDirectory(pathOfSameFolder)
    except OSError:

        print ('Error: Creating directory. ' +  str(pathOfSameFolder) + ' may be existing!')
        dirCreated = False 

    if dirCreated :
        audioclip = videoclip.audio
        audioclip.write_audiofile(mp3_file)
        audioclip.close()
        videoclip.close()
        print('Converting audio transcripts into text ...')
        transcript()
        directoryMove(newDir)

#Checks first if path exists and if not it creates one File
def createDirectory(pathOfFolder):
    sum = 0
    directory=" "
    #In Range wird die Maximale Anzahl der möglichen Ordner definiert 
    for num in range(5):
        
        if num==range:
            print("Not More Possible. Change Range !")
            exit()
        if not os.path.exists(pathOfFolder+str(num)):
            
            print("Directory or File name is: ", pathOfFolder+str(num) )

            #Make a new Folder or Directory
            os.makedirs(pathOfFolder+str(num))

            directory=pathOfFolder+str(num)

            sum =+num
            return directory
            break

#Move first Audiofile to Folder and change directory to continue
def directoryMove(directory):
    shutil.move('/Users/younesyaakoubi/Desktop/'+str(audioFileName), directory)
    shutil.move('/Users/younesyaakoubi/Desktop/'+str(scriptName), directory)
    
#DOES NOTHING YET !!!! - Downsample 44.1kHz to 8kH
def downSample():
    # Load into PyDub
    
    print("Downsampling of Audio succesful")

#createFolder('./AudioInput/')
#os.chdir("/Users/younesyaakoubi/Desktop/AUDIO_FILE")
#f.write(audioFile)

def transcript():
    with sr.AudioFile(str(audioFileName)) as source:
    
     audio_text = r.listen(source)

    #recoginize_() method will throw a request error if the API is unreachable, hence using exception handling
    try:
        
        # using Sphinx speech recognition
        text = r.recognize_sphinx(audio_text)
        

        f = open(str(scriptName),"w+")
        f.write(text)

        f.close()

        print("Converting succesful")
     
    except:
         print('Sorry.. run again...')
    
#keyWordSearch()
def keyWordSearch():
    audio = AudioFile(**config)
    for phrase in audio:
       
        #print(phrase)
        print("Find keywords...")

        f= open(str(scriptName),"a")
        f.write(" "+str(phrase))

        print("Keywords found")
        
        f.close()

#keyWordOrder()
def keyWordOrder():

    print("Classify Keywords")

    with open(str(scriptName)) as file:
   
    # reading each line    
        for line in file:
   
        # reading each word        
          for word in line.split():
   
            # displaying the words           
                print(word) 
    
    with open(str(scriptName)) as file:
         # reading each line    
        for line in file:
   
        # reading each word        
            for word in line.split():
   
            # displaying the words           
                print(word) 

#See in which Directory the path is described
print ("The current working directory is %s" % path)

convert()

print("Thanks for using xxxx")

2 Answers2

0

Python is (unless specified) a synchronous language, meaning a process won't/cannot be started while the previous one is not finished.

Applied to your case, if your directoryMove() function is executed, it means your transcript() function completely finished, in one way or another (maybe it failed).

You could print more debug logs inside both functions to convince yourself and help you investigate more about what's the issue.

bolino
  • 867
  • 1
  • 10
  • 27
0

It would be useful to know what does the transcript function ? I suppose it is launching a thread. and in this case, you have to catch a thread id and join() it to wait for end of thread

sancelot
  • 1,905
  • 12
  • 31
  • I will try it with threads. Do you have an easy and understandable source for beginners ? – Simon Unge May 05 '21 at 16:25
  • python threading is enough. https://stackoverflow.com/questions/2846653/how-can-i-use-threading-in-python – sancelot May 05 '21 at 17:19
  • No, if no `async` keyword precedes the function, it will not launch an async thread. It's sync by default, in the same thread. – bolino May 06 '21 at 08:19