0

Edit: I've changed the names of the data so that they're all the same Edit: I decided to use a symlink as suggested in the comments, but now SeventhList is too short, it has 1428 elements in it when it should have 1432 elements. Why is this and how can I change it? len(AllFileList) is 10000, which is correct...

I have a list of files MC_File0.root... MC_File9999.root.

They're all inside a directory MCAll and I want to split them into 7 equal chunks and copy them to their respective directories MCFirstSeventh, MCSecondSeventh, ...

Following advice from the first commenter I have a better chunking system than before, but the copying is still slow. Is there a way to speed it up or is it hardware limited?

StartDir = "/data05/padme/beth/MC_Run660/RawMCFiles"
FirstEndDir   = "/data05/padme/beth/MC_Run660_FirstSeventh/RawMCFiles"
SecondEndDir  = "/data05/padme/beth/MC_Run660_SecondSeventh/RawMCFiles"
ThirdEndDir   = "/data05/padme/beth/MC_Run660_ThirdSeventh/RawMCFiles"
FourthEndDir  = "/data05/padme/beth/MC_Run660_FourthSeventh/RawMCFiles"
FifthEndDir   = "/data05/padme/beth/MC_Run660_FifthSeventh/RawMCFiles"
SixthEndDir   = "/data05/padme/beth/MC_Run660_SixthSeventh/RawMCFiles"
SeventhEndDir = "/data05/padme/beth/MC_Run660_SeventhSeventh/RawMCFiles"

AllFileList = os.listdir(StartDir)
#sorted(AllFileList, key=lambda x: int(x.split("_")[-1].split(".")[0]))

n = len(AllFileList)/7

FirstList = [AllFileList[i:i + n] for i in xrange(0, n, 1)]
SecondList = [AllFileList[i:i + n] for i in xrange(n, 2*n, 1)]
ThirdList = [AllFileList[i:i + n] for i in xrange(2*n, 3*n, 1)]
FourthList = [AllFileList[i:i + n] for i in xrange(3*n, 4*n, 1)]
FifthList = [AllFileList[i:i + n] for i in xrange(4*n, 5*n, 1)]
SixthList = [AllFileList[i:i + n] for i in xrange(5*n, 6*n, 1)]
SeventhList = [AllFileList[i:i + n] for i in xrange(6*n, len(AllFileList), 1)]

print 6*n

for ii in FirstList[0]:
    print ii
    os.symlink(os.path.join(StartDir,ii),os.path.join(FirstEndDir,ii))

for ii in SecondList[0]:
    print ii
    os.symlink(os.path.join(StartDir,ii),os.path.join(SecondEndDir,ii))

for ii in ThirdList[0]:
    print ii
    os.symlink(os.path.join(StartDir,ii),os.path.join(ThirdEndDir,ii))

for ii in FourthList[0]:
    print ii
    os.symlink(os.path.join(StartDir,ii),os.path.join(FourthEndDir,ii))

for ii in FifthList[0]:
    print ii
    os.symlink(os.path.join(StartDir,ii),os.path.join(FifthEndDir,ii))

for ii in SixthList[0]:
    print ii
    os.symlink(os.path.join(StartDir,ii),os.path.join(SixthEndDir,ii))

for ii in SeventhList[0]:
    print ii
    os.symlink(os.path.join(StartDir,ii),os.path.join(SeventhEndDir,ii))
Beth Long
  • 375
  • 3
  • 24
  • Does this answer your question? [How do I split a list into equally-sized chunks?](https://stackoverflow.com/questions/312443/how-do-i-split-a-list-into-equally-sized-chunks) and/or [Splitting a list into N parts of approximately equal length](https://stackoverflow.com/questions/2130016/splitting-a-list-into-n-parts-of-approximately-equal-length) – JonSG May 11 '23 at 17:19
  • 1
    `sorted(AllFileList, key=lambda x: int(x.split("_")[-1].split(".")[0]))` gives you the sorted list of files for a better starting point. If this is about speed, is copying a requirement? Or can you just symlink instead? – Matthias Huschle May 11 '23 at 17:37
  • ...or a hard link (os.link) to copy (it seems you just copy to a subdirectory)? https://docs.python.org/3/library/os.html – mrxra May 11 '23 at 17:55
  • Turns out the symlink works just fine, thank you for that tip! – Beth Long May 11 '23 at 17:59
  • @MatthiasHuschle that line gives me this error: ```File "Rsyncingfiles.py", line 14, in sorted(AllFileList, key=lambda x: int(x.split("_")[-1].split(".")[0])) File "Rsyncingfiles.py", line 14, in sorted(AllFileList, key=lambda x: int(x.split("_")[-1].split(".")[0])) ValueError: invalid literal for int() with base 10: '2Run660'``` – Beth Long May 11 '23 at 18:20
  • Closing this because it’s changed too much fromy original question – Beth Long May 11 '23 at 19:59

0 Answers0