2

I have a python script (grouper.py) that accepts 1 argument as input. Currently, due to size of the input file, I must break the input argument up into 20 chunks and open 20 terminals and run all 20 at once.

Is there a way to loop through all 10 input arguments that kicks off a python process?

def fun(i):
    j = pd.read(i)    
    # do some heavy processing
    return j
for i in inputfiles:
    print(i)
    outputfile=fun(i)
    outputfile.to_csv('outputfile.csv', index=False)

The above code of mine does each inputfile 1 at a time... IS there a way to run all 20 input files at once??

Thanks!!

BobcatBlitz
  • 177
  • 7
  • Does this answer your question? [Is it possible to run function in a subprocess without threading or writing a separate file/script.](https://stackoverflow.com/questions/2046603/is-it-possible-to-run-function-in-a-subprocess-without-threading-or-writing-a-se) – mkrieger1 Jan 17 '20 at 20:16

1 Answers1

1

Q : Script to kick-off multiple instances of another script that takes an input parameter?

GNU parallel solves this straight from the CLI :

parallel python {} ::: file1 file2 file3 ...... file20

Given about a 20+ CPU-core machine goes on this, # do some heavy processing may remain unconstrained to a just-[CONCURRENT] CPU-scheduling and may indeed perform the work in an almost [PARALLEL] fashion ( without race conditions on shared resources )

for i in inputfiles:
    print(i)
    outputfile=fun(i)
    ...

is a pure-[SERIAL] iterator, producing just a sequence of passes, so launching the process straight from CLI may be the cheapest ever solution. Python joblib and other multi-processing tools can spawn copies of the running python-interpreter, yet that will be at rather remarkable add-on cost, if just a batch-operated processing from a single CLI command may suffice the target - to process a known list of files into another set of output files.

user3666197
  • 1
  • 6
  • 50
  • 92
  • 1
    Also: If you are splitting a big file into chunks to process each chunk, look at `parallel --pipepart`. It may just be doing that for you on-the-fly. That way you do not need to have space for two copies of `bigfile`. – Ole Tange Jan 19 '20 at 14:11