0

You are given an array in the following format:

file_name_and_desti_subdir
    = [['1020','A']
       ['1020','A']
       ['1106','A']
       ['1003','B']
       ['1003','B']
       ['1004','C']
       ['1005','C']]

Using this array you are expected to copy out of a given directory to a given destination directory. The "1013" specifies a filename without extension and the "A" specifies a sub directory of the destination.

So after the code has finished running the file structure should look like this:

Destination 
  A
     1020.jpg
     1020(1).jpg
     1106.jpg
  B
     1003.jpg
     1003(1).jpg
  C
     1004.jpg
     1005.jpg

To do this you'd need to select the correct item from the source file and copy it the desired location in the destination file.

Here is what I tired. (Or at least a simplified version of it)

from shutil import copy
from os import wlak, path

def copyFile(source, desti, file_name_and_desti_subdir)

    for item in file_name_and_desti_subdir:
        for root, subdir, files in walk(source): #using os.walk to find correct item in source
            for file in files:

                item_source_path = path.join(root, file) #constructing source path of item

                if file.split('.')[0] == item[0]: #choice structure
                    if item[1] == 'A':
                        copy(item_source_path, desti + "\\A")
                    elif item[1] == 'B':
                        copy(item_source_path, desti + "\\B")
                    elif item[1] == 'C':
                        copy(item_source_path, desti + "\\C")

This code will however result in the following file structure:

Destination 
  A
     1020.jpg
     1106.jpg
  B
     1003.jpg
  C
     1004.jpg
     1005.jpg

Note the items that have been copied twice are not include because they have been overwritten. My question is how do I prevent this form happening.

P.S.

I have a separate function that handles creating the file structure in the destination folder.

  • 1
    this is not a MRE see https://stackoverflow.com/help/minimal-reproducible-example – pippo1980 Aug 06 '21 at 05:31
  • 1
    that feature is called "filename versioning", visit for example: https://code.activestate.com/recipes/52663-versioning-file-names/ – VPfB Aug 06 '21 at 05:37
  • @VPfB The example you provided is nearing a solution to my problem. However it requires knowing before hand the amount of copies I want to make. I'll try making a MRE as suggested. – Christopher Aug 06 '21 at 06:15
  • @pippo1980 I restructured my question. I hope it's easier to understand now. – Christopher Aug 06 '21 at 07:01
  • Is the question just “How do I pick the new file names?”? – Davis Herring Aug 06 '21 at 07:44
  • @DavisHerring It's more along the lines of how do ensure the duplicates aren't overwritten. I get that I need to add some suffix to the file name. "file(0).jpg, file(1).jpg...etc." I just don't know how to implement it. – Christopher Aug 06 '21 at 08:00
  • @Christopher: That’s the same thing (since overwriting is avoided precisely by choosing new names), but it *seems* like a string-manipulation question too trivial to ask. What am I missing? – Davis Herring Aug 06 '21 at 08:40
  • @DavisHerring might seem trivial I know. but I can't think of a way check for file presence and then chooses a new file name, for each duplicate. – Christopher Aug 06 '21 at 08:45
  • 1
    @Christopher: According to your example, you know when a collision will occur because it’s your code that already used the (undecorated) name. – Davis Herring Aug 06 '21 at 09:26
  • @Christopher filename versioning does not need to know the amount of copies. Please take a look again. I think it does answer the core of your question. – VPfB Aug 06 '21 at 11:15
  • @VPfB Indeed it does. I'd love it if you posted that example as an answer. I ended up using the example you gave and rewriting it so that it adds the version to the file name instead of the extension. I then implemented it into my program. Thank you so much for your help. – Christopher Aug 06 '21 at 11:51
  • @Christopher I'm glad I could help. But I doubt my tiny comment would be sufficient as an answer and the code that gave you inspiration is not mine either. – VPfB Aug 06 '21 at 12:11
  • You could check with file.exist and switch to file versioning if file already there: https://stackoverflow.com/questions/82831/how-do-i-check-whether-a-file-exists-without-exceptions – pippo1980 Aug 06 '21 at 12:48
  • why you loop over the array insted of looping over the actual source directory ? – pippo1980 Aug 06 '21 at 15:34
  • I'm looping through the array then scanning the source directory for a matching file name. Do you think it would be more efficient to do it the other way around? @pippo1980 Bare in mind the array will always have less items then the source directory. – Christopher Aug 07 '21 at 22:34
  • @Christopher sorry thought given your example it would be the other way round, considered the duplicated images in array but didn’t get the missing images (1006 to 1019, 1021 to 1105) – pippo1980 Aug 08 '21 at 16:50

2 Answers2

1

my attempt:


import numpy as np

file_name_and_desti_subdir = np.array([['1020','A'],
       ['1020','A'],
       ['1106','A'],
       ['1003','B'],
       ['1003','B'],
       ['1004','C'],
       ['1005','C']]) #.astype('object')
    
    
print(file_name_and_desti_subdir, file_name_and_desti_subdir.size, file_name_and_desti_subdir.shape, 
      file_name_and_desti_subdir.ndim, file_name_and_desti_subdir.dtype)

from shutil import copy,copyfile
from os import walk, path, makedirs

def copyFilez(source, desti, file_name_and_desti_subdir):
    nA = 1
    nB = 1
    nC = 1
    for item in file_name_and_desti_subdir:
        print('ITEM :', item)
        for root, subdir, files in walk(source): #using os.walk to find correct item in source
            print('root : ',root)
            print('subdir :',subdir)
            for file in files:
                print(file, item[1])
                item_source_path = path.join(root, file) #constructing source path of item
                print('item_source_path : ', item_source_path)
                if file.split('.')[0] == item[0]: #choice structure
                    if item[1] == 'A':
                        print(desti + "/A/"+file)
                        if not path.exists(desti + "/A"):
                            makedirs(desti + "/A", exist_ok=True)
                        if path.isfile('/'+desti + r"/A/"+file) == True:
                            copy(item_source_path, '/'+desti + "/A/"+file.split('.')[0]+'_'+str(nA)+'.'+file.split('.')[1]) 
                            nA += 1 
                        else:
                            copyfile(item_source_path, desti + "/A/"+file )
                    elif item[1] == 'B':
                        if not path.exists(desti + "/B/"):
                            makedirs(desti + "/B", exist_ok=True)
                        if not path.isfile(desti + "/B/"+file):
                            copy(item_source_path, desti + "/B/"+file)
                        else:
                            copy(item_source_path, desti + "/B/"+file.split('.')[0]+'_'+str(nB)+'.'+file.split('.')[1])
                            nB += 1 
                    elif item[1] == 'C':
                        if not path.exists(desti + "/C"):
                            makedirs(desti + "/C", exist_ok=True)
                        if not path.isfile(desti + "/C/"+file):
                            copy(item_source_path, desti + "/C/"+file)
                        else:
                            copy(item_source_path, desti + "/C/"+file.split('.')[0]+'_'+str(nC)+'.'+file.split('.')[1])
                            nC += 1 

        
copyFilez('SOURCE', 'DEST', file_name_and_desti_subdir)

as suggested by Cristhopher I remade the script numbering each one of the files and using array to calculate the times each file appears in the different sections:

from shutil import copy
from os import walk, path, makedirs
import numpy as np

file_name_and_desti_subdir = np.array([['1020', 'A'],
                                       ['1020', 'A'],
                                       ['1106', 'A'],
                                       ['1003', 'B'],
                                       ['1003', 'B'],
                                       ['1004', 'C'],
                                       ['1005', 'C'],
                                       ['1205', 'A'],
                                       ['1205', 'A'],
                                       ['1205', 'A'],
                                       ['1205', 'A'],
                                       ['1205', 'B'],
                                       ['1205', 'C']])  # .astype('object')



def copyFilez(source, desti, file_name_and_desti_subdir_copy):
    file_name_and_desti_subdir_copy = np.zeros((file_name_and_desti_subdir.shape[0]) ,dtype = "object")

    for i in range(file_name_and_desti_subdir.shape[0]):
        file_name_and_desti_subdir_copy[i] = file_name_and_desti_subdir[i,0]+file_name_and_desti_subdir[i,1]
        
    file_name_and_desti_subdir_copy2 = np.zeros((file_name_and_desti_subdir.shape[0],4) ,dtype = "object")
    
    for i in range(file_name_and_desti_subdir_copy.shape[0]):
        file_name_and_desti_subdir_copy2[i,0] = file_name_and_desti_subdir[i,0]
        file_name_and_desti_subdir_copy2[i,1] = file_name_and_desti_subdir[i,1]
        file_name_and_desti_subdir_copy2[i,2] = file_name_and_desti_subdir_copy[i]
        file_name_and_desti_subdir_copy2[i,3] = str(np.count_nonzero(file_name_and_desti_subdir_copy[:i+1] == file_name_and_desti_subdir_copy[i])).zfill(6)
        
    print(file_name_and_desti_subdir_copy2, file_name_and_desti_subdir_copy2.size, file_name_and_desti_subdir_copy2.shape)
    
    
    
    for item in file_name_and_desti_subdir_copy2:
        print('ITEM :', item)
        # using os.walk to find correct item in source
        for root, subdir, files in walk(source):
            print('root : ', root)
            print('subdir :', subdir)
            for file in files:
                print(file, item[1])
                # constructing source path of item
                item_source_path = path.join(root, file)
                print('item_source_path : ', item_source_path)
                if file.split('.')[0] == item[0]:  # choice structure
                    if item[1] == 'A':
                        print(desti + "/A/"+file)
                        if not path.exists(desti + "/A"):
                            makedirs(desti + "/A", exist_ok=True)
                        copy(item_source_path, desti + "/A/"+file.split('.')[0]+"_"+ item[3] + "." +file.split('.')[1])


                    elif item[1] == 'B':
                        if not path.exists(desti + "/B/"):
                            makedirs(desti + "/B", exist_ok=True)
                        copy(item_source_path, desti + "/B/"+file.split('.')[0]+"_"+ item[3] + "." +file.split('.')[1])
                    elif item[1] == 'C':
                        if not path.exists(desti + "/C"):
                            makedirs(desti + "/C", exist_ok=True)
                        copy(item_source_path, desti + "/C/"+file.split('.')[0]+"_"+ item[3] + "." +file.split('.')[1])

copyFilez('SOURCE', 'DESTinazione', file_name_and_desti_subdir)

it goes through creating two new arrays the last one here:

[['1020' 'A' '1020A' '000001']
 ['1020' 'A' '1020A' '000002']
 ['1106' 'A' '1106A' '000001']
 ['1003' 'B' '1003B' '000001']
 ['1003' 'B' '1003B' '000002']
 ['1004' 'C' '1004C' '000001']
 ['1005' 'C' '1005C' '000001']
 ['1205' 'A' '1205A' '000001']
 ['1205' 'A' '1205A' '000002']
 ['1205' 'A' '1205A' '000003']
 ['1205' 'A' '1205A' '000004']
 ['1205' 'B' '1205B' '000001']
 ['1205' 'C' '1205C' '000001']]

and counts with np.count_nonzero

zeros to the numbering are added by str(number)zfill()

pippo1980
  • 2,181
  • 3
  • 14
  • 30
  • I actually think this is more efficient then my solution. Because I use `.count` on every item in the array and check if it's greater then 2, to find all duplicates and their amounts, so could for which items and how many time to run FileVersion; which is O(n^2). This solution is nice and linear. Correct me if I'm wrong though, but if you encounter a second duplicate in the array wont it continue counting where the last duplicate stopped. Something like "1000_1.jpg, 1000_2.jpg.....then.....1055_3.jpg, 1055_4.jpg" if they were all A. Not that this would be an issue honestly. still works. – Christopher Aug 08 '21 at 22:47
  • @christopher . Yes you are right about the numbering issues. My focus was on the os.makedirs because was getting an error passing a non-existent path to os.copy() – pippo1980 Aug 09 '21 at 04:16
  • If I have time I’ll try the new array approach as you suggested, – pippo1980 Aug 09 '21 at 04:19
0

For any one interested I ended up using an example provided by VPfB in the comments. The example he provided was not his own. It's written in python 2 so had to make some changes. It also appended the version of the file to the extension, which causes some trouble. (Original Example) Here is my adjusted version.

   from os import path
   from shutil import copy

    def VersionFile(source, destination):
        if path.isfile(destination):
            name, extension = path.splitext(destination)
            for i in range(1000):
                new_file = f'{name} ({i}){extension}'
                if not path.isfile(new_file):
                    copy(source, new_file)
                    break

For this to work I also needed to look for any duplicates in the array and then count the amount of times each appears. I save the duplicates in an array, then the amount of times they appear in another parallel array.

Then in the core loop I changed the ifs and elifs as follows:

if item[1] == 'A':
     copy(item_source_path, desti + "\\A")
     if item in duplicates:
         for i in range(duplicates_count[duplicates.index(item)])
              VersionFile(item_source_path,  desti + "\\A\\" + file)
     break

(Added breaks to cut down on run time)

Then after the first for loop I just added a check to ensure all subsequent occurrences of that item are ignored. I will also need to refer to item by index from now on. (duplicates will always appear right next to one another.).

   for i in range(len(file_name_and_desti_subdir):
       item = file_name_and_desti_subdir[i]
          if package == file_name_and_desti_subdir[i - 1]:
             pass
          else:
             ...(main loop)
  • does passing path to shutil.copy works now ? Was trying to answer (left for the week-end) and got some error that I ascribed to https://stackoverflow.com/questions/2793789/create-destination-path-for-shutil-copy-files – pippo1980 Aug 08 '21 at 16:57
  • I've been passing strings like, "C:\\foo\\bar", and I haven't gotten any errors. – Christopher Aug 08 '21 at 22:20
  • Ok thanks got it . I was using as in the link above the : “If a path such as b/c/ does not exist in ./a/b/c , shutil.copy("./blah.txt", "./a/b/c/blah.txt") will complain that the destination does not exist. What is the best way to create both the destination path and copy the file to this path?”’ approach. – pippo1980 Aug 09 '21 at 04:22