I have a script that takes in input a list of filenames
and loops over them to generate an output file per input file, so this is a case which can be easily parallelized I think.
I have a 8 core machine.
I tried on using -parallel
flag on this command:
python perfile_code.py list_of_files.txt
But I can't make it work, i.e. specific question is: how to use parallel in bash with a python command in Linux, along with the arguments for the specific case mentioned above.
There is a Linux parallel command (sudo apt-get install parallel
), which I read somewhere can do this job but I don't know how to use it.
Most of the internet resources explain how to do it in python but can it be done in bash?
Please help, thanks.
Based on an answer, here is a working example that is still not working, please suggest how to make it work.
I have a folder with 2 files, i just want to create their duplicates with a different name parallely in this example.
# filelist is the directory containing two file names, a.txt and b.txt.
# a.txt is the first file, b.xt is the second file
# i pass an .txt file with both the names to the main program
from concurrent.futures import ProcessPoolExecutor, as_completed
from pathlib import Path
import sys
def translate(filename):
print(filename)
f = open(filename, "r")
g = open(filename + ".x", , "w")
for line in f:
g.write(line)
def main(path_to_file_with_list):
futures = []
with ProcessPoolExecutor(max_workers=8) as executor:
for filename in Path(path_to_file_with_list).open():
executor.submit(translate, "filelist/" + filename)
for future in as_completed(futures):
future.result()
if __name__ == "__main__":
main(sys.argv[1])