1
  • I need to parse .docx document and find out that if .wav files mentioned in the document are available in a sound directory(if sound directory exists with some .wav file) or not.
  • I am able to parse the document and able to store the .wav files name in a list, but I have no idea how to search if the list items are available in the sound directory or not.
  • Also, I cannot provide the full path of sound directory.
  • My directory structure is like "E:\Package\somefolder\sound"
  • My code is storing the list is mentioned below.
import os
import docx2txt
import re
    
parent_dir = "E:\PackageTesting" # Your directory here

def get_all_files(dir_name):
    file_list = os.listdir(dir_name)
    all_files = list()

    # Iterate over all the entries
    for entry in file_list:
        # Create full path
        full_path = os.path.join(dir_name, entry)
        # If entry is a directory then get the list of files in this directory
        if os.path.isdir(full_path):
            all_files = all_files + get_all_files(full_path)
        else:
            all_files.append(full_path)
    
    return all_files

all_files_in_dirs = get_all_files(parent_dir)

for file in all_files_in_dirs:
    if "Sound_Doc" in os.path.basename(file) and os.path.splitext(file)[1] == ".docx":
        print(format(file))
        MY_TEXT = docx2txt.process(format(file))
        wav = re.findall(r'[\w\.-]+w+a+[\w\.-]+', MY_TEXT)
        print(wav)
TheOneMusic
  • 1,776
  • 3
  • 15
  • 39
vickyD
  • 21
  • 1
  • 6
  • Could you please be a little more specific about the **actual** issue you're facing? If I'm reading the question correctly, the fact that you're reading the list of files from a Word document is irrelevant. What you have is a list of filenames, and you'd like to know if they edit in a given directory? – Roy2012 Jul 28 '20 at 06:34
  • @Roy2012 I have to get the list from the Word document because all the files are mentioned in the document also there are some special characters mentioned with the file name which need to be removed. So I am storing the file name in the list and then removing the special charecters. – vickyD Jul 28 '20 at 07:47
  • @TheOneMusic Thanks..this is working perfectly. Also is there any way to strip the special characters only from the beginning or end of the wav string because some are appearing with a dot in the end like abc.wav. – vickyD Jul 29 '20 at 04:57

3 Answers3

0

You can have a list of all files in a directory and browse it with those lines:

import os

for filename in os.listdir(directory):
    with open (directory+'/'+filename) as f:
        #your code here
0

Have a look at this post, it contains information on how to recursively look for files in a directory. So you could look for each of the wav files like this:

import os

def find_file(root_dir, filename):
    for root, dirs, files in os.walk(root_dir):
        for file in files:
            if file == filename:
                 return True # return true if file was found
    print("Not found: " + filename) # print full path
    return False # return false if file was not found

for wav_file in wav_file_list:
    find_file(parent_dir, wav_file)

If you have a lot of files and directories, you might want to put all wav files into a set to speed up the lookup, otherwise you script might take a long time to run.

Create the set like this:

import os

def find_all_wav_files(root_dir):
    wav_files = set()
    for root, dirs, files in os.walk(root_dir):
        for file in files:
            if file.endswith(".wav"):
                 wav_files.add(os.path.join(root, file))
    return wav_files

You can then check if the wav files exist like this:

wav = parse_your_docx(docx_file)
wav_files = find_all_wav_files("your/sound/directory")

for w in wav:
    if w in wav_files:
        print(w + " exists!")
    else:
        print(w + " does not exist!")

The function find_all_wav_files() will return a set of all wav files in the directory you specify as first argument. You can then search it like a list as shown in the second code snippet.

toydarian
  • 4,246
  • 5
  • 23
  • 35
  • The first solution of yours is working partially. It is printing only those files which are available in the sound directory and matching with the parsed list. What I want is vice-versa i.e. the files which are stored in the list match with the files of the sound directory and if any file didn't match, print the name of the unmatched file (e.g a.wav is not available in sound directory). – vickyD Jul 28 '20 at 09:56
  • I moved the `print()` in the code so it will print files that were not found – toydarian Jul 28 '20 at 10:00
  • It is printing the .wav file which is not available in the list but available in the sound directory – vickyD Jul 28 '20 at 10:15
  • You are right, I forgot to change the `print()`. See edit now – toydarian Jul 28 '20 at 10:45
0

you can use glob module to list the file and directory in the directory.

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order.

**syntax:**
 glob.glob(pathname, *, recursive=False)

Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools//.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell). Whether or not the results are sorted depends on the file system.

Note:use * to list everything similarly '.wav' and '.docx' for wav file and document respectively.

ex: glob.glob('E:\Package\somefolder\sound\'+'*.wav')
Y Munawwer
  • 126
  • 1
  • 6
  • @Y Munawwer I cannot provide the absolute path in my condition. It needs to traverse the whole directory to find sound subdirectory and then verify that stored wave files are available or not – vickyD Jul 28 '20 at 08:00
  • @vickyD if that is the case then you can use regular expressionwith recursive set to True like this: files = glob.glob('/root_directory/**/sound/*.wav', recursive = True) and this will return list of files and if the list length is null which means either the file or directory is missing or you can use environ variable in place of root_directory eg:os.getenv('HOME).And please accept the answer if found useful. – Y Munawwer Jul 28 '20 at 10:36