-1

I have a program that I am trying to write that will take a very large directory (10,000+files inside) and will create new sub directories to break the very large directory into smaller chunks (of approximately 100 files each).The program that I have currently raises no errors when i call it in terminal, but it does not actually sort the large file... I think the problem is with os.rename() but I dont understand why i also tried shutil.move() and still had the same problem. Sorry I couldent make code appear in color I am new to the site

#!/usr/bin/python
import os
import glob
import sys
from functools import partial
sys.setrecursionlimit(1000) 

def mk_osdict(a):
    #os.chdir(a)
    #grouping files with .mol2 endings only
    os_list =glob.glob("*.mol2")
    #making a dictionary for the list of files in the directory
    os_dict = dict([i,n] for i,n in zip(range(len(os_list)),os_list))
    return os_dict

dict_os = mk_osdict("decoys")

#function to sort files into new directories with a specific size. 
def init_path(f):   
    block = (len(f)/100)+1
    #i_lst gives a list of the number of entries
    i_lst = [str(i) for i in range(block)]
    '''paths keys will become new directories, values will be a list
    files to be sorted into the corresponding directory'''
    paths = dict(["decoydir"+n.zfill(5),[]] for n in i_lst)
    for lst in paths.values():
        while len(lst) <= block:
            for value in f.values():
                lst.append(value)
    for x,p in paths:
        if not os.path.exists(x):
            os.mkdir(x)
        else:
            pass   
        for index in p:
            yield os.rename(index,os.path.join(x,index))

b = init_path(dict_os )
P_A_Logan
  • 51
  • 7
  • You mention 'but it does not actually sort the large file'; do you mean 'it does not remove the files from the big directory'? Rename doesn't sort anything; it renames files, and a side-effect of renaming might be to move an individual file from one (big) directory into one (new, small) directory. I have not explired what happens if you're busy changing the contents of a directory while a process is scanning it, but it probably won't break. – Jonathan Leffler Apr 01 '15 at 16:06
  • This isn't going to work for several reasons, but part of your problem is that `init_path` uses the `yield` statement which makes it a generator. So, just calling it `b = init_path(dict_os)` (which also doesn't work because two params are needed) simply initializes the generator and doesn't do any renames. – tdelaney Apr 01 '15 at 16:10
  • @JonathanLeffler sorry for the confusion. What I mean is that when I run the program that no files get sorted into the new directories created. – P_A_Logan Apr 01 '15 at 18:09
  • @tdelaney that was a typo, but when I fixed it the program still did not sort. Maybe can you explain how I can get the generator to work in this case? One is created in the function regardless of if I use yield or not. (Not really sure why). I've seen generators used with the os module before to join paths (ex: yield os.path.join(path,name)) – P_A_Logan Apr 01 '15 at 18:09
  • We have a terminology problem... you are sorting files, you are moving them. So, I think when you say "did not sort" you really mean "did not move". Part of your problem is that there is no need to do a `yield` here. When you say `yield os.rename(index,os.path.join(x,index))` you are saying that you want the algorithm to start calling `os.rename` and yielding its return code when another part of the program iterates `init_path`. Just remove the `yield` and you'll get around that part. As for the rest, add some print statements and see where things go sideways. – tdelaney Apr 01 '15 at 18:18

2 Answers2

0

My answer probably won't tell you what is wrong with your code but I think it will help you to solve your initial problem. I'm sure that this is not the most efficient way to solve it, but it's easily testable and in my opinion well readable.

import os

def read_dir(adir):
    files = os.listdir(adir)

    # do some filtering of files to get only the files you want
    ...

    return files

# creates n amount of subdirs in a given dir
# dirs get named 0,1,2,3...
def create_subdirs(apath, n):
    for i in n:
        os.makedirs(apath+n)

def move_files(myfiles, frm, to):
    for fl in myfiles:
        os.rename(frm+fl, to+fl)

# yields chunks of a list of specific size
def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    for i in xrange(0, len(l), n):
        yield l[i:i+n]

A_VERY_LARGE_DIR = "/path/to/dir/"
files_in_large_dir = read_dir(A_VERY_LARGE_DIR)
number_of_subdirs = (len(files_in_large_dir)/100)+1
files_in_chunks = list(chunks(files_in_large_dir, 100))

create_subdirs(A_VERY_LARGE_DIR, number_of_subdirs)

for i in number_of_subdirs:
    topath = A_VERY_LARGE_DIR + i + "/"
    move_files(files_in_chunks[i], A_VERY_LARGE_DIR, topath)

Note: This is not complete code. Some functionality has to be added for filtering files. Paths need to be filled in. Etc..

Note2: The chunks function I stole (borrowed :D ) from this thread

Community
  • 1
  • 1
rfmind
  • 58
  • 1
  • 6
  • Great! so this seems to work, but how come your generator function works but the one in the original does not? I used the yield from this example I found on David Beazley's site (dabeaze.com) import os import fnmatch def gen_find(filepat,top): for path, dirlist, filelist in os.walk(top): for name in fnmatch.filter(filelist,filepat): yield os.path.join(path,name) – P_A_Logan Apr 01 '15 at 17:58
  • I think the problem is that the original code yields os.rename which just yields that expression and doesn't execute it. In my code I just yield a sublist. I could be wrong :) – rfmind Apr 01 '15 at 18:14
0

You can perform this task more simply using a few list manipulations on the the files returned by glob. Creating intermediate data structures makes the code more confusing - you can just do the directory creation and moves as you go:

import os import glob

def mk_tree(path):
    files = glob.glob(os.path.join(path, "*.mol2"))
    chunks = [files[chunk:chunk+100] for chunk in range(0, len(files), 100)]
    for i, chunk in enumerate(chunks):
        new_dir = os.path.join(path, "decoydir%05d" % i)
        os.mkdir(new_dir)
        for fn in chunk:
            os.rename(fn, os.path.join(new_dir, os.path.basename(fn)))
tdelaney
  • 73,364
  • 6
  • 83
  • 116