-1

I have a folder with a large number of files (mask_folder). The filenames in this folder are built as follows:

  • asdgaw-1454_mask.tif
  • lkafmns-8972_mask.tif
  • sdnfksdfk-1880_mask.tif
  • etc.

In another folder (test_folder), I have a smaller number of files with filenames written almost the same, but without the addition of _mask. Like:

  • asdgaw-1454.tif
  • lkafmns-8972.tif
  • etc.

What I need is a code to find the files in mask_folder that have an identical start of the filenames as compared to the files in test_folder and then these files should be copied from the mask_folder to the test_folder.

In that way the test_folder contains paired files as follows:

  • asdgaw-1454_mask.tif
  • asdgaw-1454.tif
  • lkafmns-8972_mask.tif
  • lkafmns-8972.tif
  • etc.

This is what I tried, it runs without any errors but nothing happens:

import shutil
import os

mask_folder = "//Mask/"
test_folder = "//Test/"

n = 8
list_of_files_mask = []
list_of_files_test = []
for file in os.listdir(mask_folder):
    if not file.startswith('.'):
        list_of_files_mask.append(file)
        start_mask = file[0:n]
    print(start_mask)

for file in os.listdir(test_folder):
    if not file.startswith('.'):
        list_of_files_test.append(file)
        start_test = file[0:n]
    print(start_test)

for file in start_test:
    if start_mask == start_test:
        shutil.copy2(file, test_folder)

The past period I searched for but not found a solution for above mentioned problem. So, any help is really appreciated.

Ralf
  • 16,086
  • 4
  • 44
  • 68
Robert
  • 3
  • 2

1 Answers1

0

First, you want to get only the files, not the folders as well, so you should probably use os.walk() instead of listdir() to make the solution more robust. Read more about it in this question.

Then, I suggest loading the filenames of the test folder into memory (since they are the smaller part) and then NOT load all the other files into memory as well but instead copy them right away.

import os
import shutil
test_dir_path = ''
mask_dir_path = ''

# load file names from test folder into a list
test_file_list = []
for _, _, file_names in os.walk(test_dir_path):
    # 'file_names' is a list of strings
    test_file_list.extend(file_names)

    # exit after this directory, do not check child directories
    break

# check mask folder for matches
for _, _, file_names in os.walk(mask_dir_path):
    for name_1 in file_names:
        # we just remove a part of the filename to get exact matches
        name_2 = name_1.replace('_mask', '')

        # we check if 'name_2' is in the file name list of the test folder
        if name_2 in test_file_list:
            print('we copy {} because {} was found'.format(name_1, name_2))
            shutil.copy2(
                os.path.join(mask_dir_path, name_1),
                test_dir_path)

    # exit after this directory, do not check child directories
    break

Does this solve your problem?

Ralf
  • 16,086
  • 4
  • 44
  • 68