4

I have a python script which takes the filename as a command argument and processes that file. However, i have thousands of files I need to process, and I would like to run the script on every file without having to add the filename as the argument each time.

The script works well when run on an individual file like this:

myscript.py /my/folder/of/stuff/text1.txt

I have this code to do them all at once, but it doesn't work

for fname in glob.iglob(os.path.join('folder/location')):
    proc = subprocess.Popen([sys.executable, 'script/location.py', fname])
    proc.wait()

Whenever I run the above code, it doesn't throw an error, but doesn't give me the intended output. I think the problem lies with the fact that the script is expecting the path to a .txt file as an argument, and the code is only giving it the folder that the file is sitting in (or at least not a working absolute reference).

How to correct this problem?

Damian Yerrick
  • 4,602
  • 2
  • 26
  • 64
cars0245
  • 125
  • 2
  • 3
  • 7
  • 1
    Why not edit `myscript.py` and split it up into functions? You then can do `from myscript import my_function` and call `my_function` on every file you need. – Blender Jun 17 '15 at 17:11
  • 1
    The `os.path.join('folder/location')` does nothing. Try `os.path.join('folder/location', '*.txt')` — one usually passes a file name pattern argument with wildcard characters in it to `glob.iglob()`. – martineau Jun 17 '15 at 17:21
  • related: [Call python script with input with in a python script using subprocess](http://stackoverflow.com/q/30076185/4279) and [Python threading multiple bash subprocesses?](http://stackoverflow.com/a/14533902/4279) – jfs Jun 21 '15 at 12:11

2 Answers2

2

If the files are in the same folder and if the script supports it, you could use that syntax :

myscript.py /my/folder/of/stuff/*.txt

The wild card will be replaced by the corresponding files.

If the script doesn't support it, isolate the process like in this quick example :

import sys

def printFileName(filename):
  print filename

def main():
  args = sys.argv[1:]
  for filename in args:
    printFileName(filename)

if __name__ == '__main__':
  main()

Then from the console, you can start it like that :

python MyScript.py /home/andy/tmp/1/*.txt /home/andy/tmp/2/*.html

This will print the pathes of all the files in both folders.

Hope this can be of some help.

Andy M
  • 5,945
  • 7
  • 51
  • 96
0

You can write another script to do this. This is just a work around, try using os.walk

import sys, os
for root, dir, files in os.walk(PATH):
    for file in files:
        os.system ('myscript.py {}'.format(root + '\\' + file))

Provide the PATH of the whole folder to os.walk, it parses all the files in the directory.

If you want to parse specific files, say for example only files with .cppfiles, then you can filter the file names like this. add this after the for file in files

if file.endswith('.cpp'):
Bharadwaj
  • 737
  • 6
  • 26