0

I would like to process 2000 files on a 64 core machine. I have a python script foo.py which I run like this:

cat file0000.txt|./foo.py > out0000.txt

Ideally I would to split the 2000 files file0000.txt to file01999.txt into forty sets each of size 50 and run foo.py on each set in parallel. For sets 1 to 4 out of 40 that would be the equivalent of the following:

cat file00[0-4][0-9] |./foo.py > outfile1.txt &
cat file00[5-9][0-9] |./foo.py > outfile2.txt &
cat file01[0-4][0-9] |./foo.py > outfile3.txt &
cat file01[5-9][0-9] |./foo.py > outfile4.txt &

Sadly the system I am running this on doesn't have parallel so I have to do this without that very useful tool.

Bash script processing commands in parallel looks similar but the most popular answer is not directly relevant and the second most popular answer uses parallel which I don't have access to.

Community
  • 1
  • 1
Simd
  • 19,447
  • 42
  • 136
  • 271
  • What is the problem with `xargs` and `-P max-procs` option? – Alper Jul 29 '16 at 07:54
  • @Alper That could be the answer but I have never used it. How would you use it for my problem? – Simd Jul 29 '16 at 07:55
  • 1
    Something like `ls -1 | xargs -I{} -P 5 sh -c "cat {} | ./foo.py > out{}.txt"`, Note: `ls -1` should be listing your input files and change `-P 5` as you like. – Alper Jul 29 '16 at 08:37
  • Can you elaborate on why you don't install GNU Parallel? As per https://oletange.blogspot.dk/2013/04/why-not-install-gnu-parallel.html – Ole Tange Jul 29 '16 at 16:42
  • @OleTange Sadly it's a centrally managed system and I am not allowed to. – Simd Jul 29 '16 at 16:46
  • @eleanora I apologise in advance, but I really do not understand how you can be allowed to run ./foo.py but not ./parallel. Can you explain the distinction why ./foo.py is allowed, but not ./parallel? – Ole Tange Jul 29 '16 at 16:53
  • 1
    @OleTange It's the installing part I am not allowed to do. I see from your very useful link that one option is to download it and run it as a normal user :) – Simd Jul 29 '16 at 17:52

1 Answers1

1

As per the comments: Do a personal installation of GNU Parallel which you are allowed to do if you are allowed to run your own scripts:

./configure --prefix=$HOME && make && make install

And then:

ls | ~/bin/parallel 'cat {} | ./foo.py > {= s/file/out/ =}'
Ole Tange
  • 31,768
  • 5
  • 86
  • 104