Find missing filenames in nested filestructure

Question

I have a source directory with sub directories with files. I also have a destination directory with sub directories with another structure.

fileNames = <get all file names from source directory>
for fileName in fileNames {
    if <not found in destination directory> {
         print fileName
    }
}

How can I do pseudo code above?

EDIT:

Example file structure:
./sourcedir/file1.txt
./sourcedir/foldera/file2.txt
./sourcedir/foldera/missingfile.txt

./destdir/file2.txt
./destdir/folderb/file1.txt

So missingfile.txt should be printed. But not file1.txt or file2.txt since they can be found under destdir somewhere.

EDIT2: I managed to do a Python implementation this was what was aiming for. I had some trouble with the bash answers when trying them. Can it be done simpler in bash?

import os
import fnmatch

sourceDir = "./sourcedir"
destinationDir = "./destdir"

def find_files(directory, pattern):
    for root, dirs, files in os.walk(directory):
        for basename in files:
            if fnmatch.fnmatch(basename, pattern):
                filename = os.path.join(root, basename)
                yield filename

print sourceDir
for sourcefilename in find_files(sourceDir, '*'):
     #if not sourcefilename.lower().endswith(('.jpg', '.jpeg', '.gif', '.png','.txt','.mov','3gp','mp4','bmp')):
     #  continue
     shouldPrint = True
     for destfilename in find_files(destinationDir, '*'):
         sourceBaseName = os.path.basename(sourcefilename)
         destBaseName = os.path.basename(destfilename)
         if sourceBaseName == destBaseName:
             shouldPrint = False
             break
     if shouldPrint:
         print 'Missing file:', sourcefilename

You have tagged this as [tag:bash] and [tag:python]. Are you assigning tags randomly, or do you require the solution to be specifically in either of these languages (why?)? — tripleee, Jul 22 '16 at 19:44
Good point. I immediately though those two tags would make sense to use, so I could understand the answer more easily. Maybe it is better to stick to one language to avoid mixing things up to much. — user317706, Jul 22 '16 at 19:52

Tom Gijselinck · Answer 1 · 2016-07-22T19:35:46.963

1

Using bash this can be easily done by running diff -r source_dir target_dir | grep Only.*source_dir | awk '{print $4}'.

diff -r source_dir target_dir shows the differences between source_dir and target_dir
grep Only.*source_dir will filter out all files existing in the source directory but not in the target directory
awk '{print $4}' will filter out the file name

edited Jul 22 '16 at 19:35

answered Jul 22 '16 at 19:20

Tom Gijselinck

2,398
1
13
11

score 0 · Answer 2 · answered Jul 22 '16 at 19:25

A bit of a hack, but you could do something with find and diff, no Python needed:

diff -u <(cd sourcedir && find . -type f) <(cd destdir && find . -type f) |\
grep "^\-\./" | sed 's/^-//'

This compares the list of files in sourcedir with the ones in destdir and then prints out only the files that exist in sourcedir but not in destdir.

Find missing filenames in nested filestructure

2 Answers2