1

I have a Python program which generates lots of files in different folders for each iteration. After each iteration, I would like to delete a certain kind of files inside a specific folder: For example, all those with this extension *.recode.vcf should be removed.

I tried

os.remove(example.recode.vcf)

... but as the folder where is looking for the file might, eventually, contain lots of files, I was wondering which would be the most efficient way to do it. As an alternative, I thought about calling bash find function. Something like...

find . -name \*.recode.vcf -type f -delete

What do you think?

peixe
  • 1,272
  • 3
  • 14
  • 31

3 Answers3

4

If by "efficiency" you mean speed, then please realize that the speed of this operation is determined by the filesystem (OS + hardware), not the implementation language. You can easily remove a bunch of files by using glob:

from glob import glob  # or iglob, see documentation

for f in glob("*.recode.vcf"):
    os.remove(f)

(This won't recurse into subdirs; use os.walk for that.)

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
1

First, check if there is really a difference between your approaches before asking for a solution. Maybe there is not and you are loosing your time to solve a problem that does exist.

When you remove a file in python, it does not remove it by itself but rather asks OS to kindly remove the file (which is good). So you can build a simple function (or better, iterator) that will return you files recursively that you remove and it would be your pythonish version of find. Who knows, maybe find is implemented in python...? (okay okay, it is not, but it could be)

See walk and glob mentioned in the other answers

Edit:

i case of large number of files, separate them into different directories. Instead of cramming 10,000 files into one directory, better put 100 dirs with 100 files each. To balance files equally between the dirs, generate filenames from, e.g., two last letters of hash of filename or file content (similarly to what git does)

Community
  • 1
  • 1
Jakub M.
  • 32,471
  • 48
  • 110
  • 179
  • Nice answer, @Jakub... ;) But the thing is that I already tried and as the number of files generating is huge, I saw some differences in performance. One point I did not mention, is that I am running this on a cluster, and the performance of it will vary depending on the load of jobs sent, as I am not the only user of it. So, what I would like to know is which method is more "efficient" by itself to put it to work... – peixe Apr 25 '13 at 13:40
  • Can you say what were the differences? N. of files vs time? Just curiosity – Jakub M. Apr 25 '13 at 14:09
  • The point is this cluster depends a lot on the workload, so; sometimes, if it was not fully functional, the program broke down because of the lots of file inside the folder. That's why I wanted to know the best way to remove the file. – peixe Apr 25 '13 at 17:14
0

I usually like to keep close control on how I delete files so I would suggest listing all the files you want deleted and then remove them one by one like this:

import glob, os
myfiles=glob.glob("/mydirectory/*.vcf")
for file in myfiles:
  os.remove(file)

Cheers, Trond

Trond Kristiansen
  • 2,379
  • 23
  • 48