1

I looked and found solutions, tried them and got the same result. I tried using Popen.wait(), run() and call(). As suggested by other users, I also tried passing the command as a list of strings. Didn't work. The subprocess call doesn't give an error, so that's not the issue.

Here's the function:

def blast(file):
    command = f'blastn -query {output_path}fasta_files/{file} -db {db_path} -max_hsps 1 -max_target_seqs 40 -num_threads 4 -evalue 1e-5 ' \
              f'-out {output_path}blast/{file[:-2]}txt -outfmt "6 qseqid sseqid pident staxids sskingdoms qstart qend ' \
              f'qlen length sstart send slen evalue mismatch gapopen bitscore stitle"'
    subprocess.Popen(command, stdout=subprocess.PIPE, shell=True).wait()

Here's the call to the function:

import blastn
from process_blast_output import *
from remove_false_sequences import *
import os

directory = '/some/path/'


if __name__ == '__main__':
    for file in os.listdir(directory + 'fasta_files'):
        if 'btcaA1_trimmed' in file:
            blastn.blast(f'{file}') # That's where the function is called
            dataframe = get_dataframe(directory + f'blast/{file[:-2]}txt')
            dataframe = get_taxonomic_data(dataframe)
            delete_false_hits(fasta_to_dictionary(dataframe), directory + f'fasta_files/{file[:-2]}fa')

Instead of passing a string I also tried passing a list:

subprocess.Popen(['blastn', '-query', f'{output_path}fasta_files/{file}', '-db', f'{db_path}', '-max_hsps', '1',
                  '-max_target_seqs', '40', '-num_threads', '4', '-evalue', '1e-5', '-out',
                  f'{output_path}blast/{file[:-2]}txt', '-outfmt', "6 qseqid sseqid pident staxids sskingdoms "
                                                                   "qstart qend qlen length sstart send slen evalue"
                                                                   " mismatch gapopen bitscore stitle"],
                 stdout=subprocess.PIPE).wait()
Ziv
  • 109
  • 10
  • 1
    I didn't understand what the problem is. What do you mean by "doesn't hold the script"? – mkrieger1 May 03 '21 at 09:07
  • 2
    BTW instead of passing the command as a string and using `shell=True`, it is preferable to pass the command as a list and use `shell=False` (the default). – mkrieger1 May 03 '21 at 09:08
  • I'm trying to pause / hold the script that calls the function, until the subprocess in the function is through. – Ziv May 03 '21 at 09:29
  • If you have used `wait()` and the script continues, then the subprocess must have finished. Either the subprocess has triggered additional processes that continue running, or you made another mistake which we can't see here. You should show a [mre]. – mkrieger1 May 03 '21 at 09:37
  • I tried your suggestion. I got another error from the program I called with subprocess. I used split() on the original command to get a list, and passed it to subprocess. I guess that's not the right way to pass a list to subprocess. – Ziv May 03 '21 at 09:37
  • No you should just write that list yourself: `command = ['blastn', '-query', f'{output_path}fasta_files/{file}', '-db', db_path, ...]` – mkrieger1 May 03 '21 at 09:38
  • Ok. I'll add the code from which the function is called. I hope that will suffice. – Ziv May 03 '21 at 09:40
  • Can you explain more about how you concluded that the script doesn't wait for the subprocess to finish? – mkrieger1 May 03 '21 at 09:45
  • First, I saw an error about a file not existing. This file is written in the function that needs to be suspended / paused. To be sure I added a print function right after the subprocess call, which printed to the job's output file. – Ziv May 03 '21 at 09:51
  • I just tried the passing the list itself, not a variable of the list, and got the same error from the program that's called from the function. Also, I wouldn't get this error from the program if passing the variable of the list didn't work, yet I did get that message, so it seems that passing the variable is ok. – Ziv May 03 '21 at 09:58
  • I can provide the list of arguments that were passed with subprocess, if it's relevant. – Ziv May 03 '21 at 10:00

1 Answers1

3

Probably the actual problem is that you were setting stdout=subprocess.PIPE but then ignoring the output. If you want to discard any output, use stdout=subprocess.DEVNULL; if you want to allow the subprocess to write to standard output normally, just don't set stdout at all.

Whether you use shell=True (and a first argument consisting of a single string for the shell to parse) or not (in which case the first argument should be a list of properly tokenized strings) has no bearing on whether the subprocess is waited for.

You should generally avoid Popen, which does not wait by default. subprocess.run() and its legacy cousins check_call() et al. do wait for the external subprocess.

Generally, probably avoid shell=True if you can.

def blast(file):
    subprocess.run(
        ['blastn', '-query,' f'{output_path}fasta_files/{file}',
          '-db', db_path, '-max_hsps', '1', '-max_target_seqs', '40',
          '-num_threads', '4', '-evalue', '1e-5 ',
          '-out', f'{output_path}blast/{file[:-2]}txt',
          '-outfmt' "6 qseqid sseqid pident staxids sskingdoms qstart qend "
                    "qlen length sstart send slen evalue mismatch gapopen "
                    "bitscore stitle"],
    stdout=subprocess.DEVNULL, check=True)

The subprocess you created will be waited for, but it is of course still possible that it created detached subprocesses of its own, which Python cannot directly wait for if the subprocess hides this from the caller.

As an aside, your if __name__ == '__main__' code should be trivial; if you put all the useful code in this block, there is no way the file can be useful to import into another script anyway, and so the whole __name__ check is pointless. The purpose of this is so you can say

def useful_code():
    # lots of code here

if __name__ == '__main__':
    useful_code()

Now, if you python scriptname.py, then __name__ will be __main__ and so the call to useful_code() will be executed immediately. But if you import scriptname (assuming you have set things up so that you can do this, with a correct sys.path and so forth) that will not cause useful_code to be run immediately; instead, the caller decides if and when they actually want to run this function (or some other function from the module, if it contains several).

As a further aside, f'{file}' is just a really clumsy way to say file (or str(file) if the variable wasn't already a string).

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • If you need concurrent execution then you actually need `Popen`; the `run` function only returns when the subprocess finishes, but then, so does `Popen.wait`, so this requirement was not clear from your question. The subprocess may need to explicitly `flush()` its output handle. – tripleee May 13 '21 at 11:29
  • The more I reread your question and the comment above, the less clear it is. Your question seems to ask how to make the caller wait for the subprocess to finish, so that's what I answered. If you don't set the `stdout` and `stderr` keyword arguments, the output will be printed to standard output and standard error as it's generated, modulo OS buffering, while your Python script waits for it to finish. If you want the output to be written to a file instead, you can open a file handle for writing and redirect ouput there; `with open(filename, 'w') as fh: subprocess.run(command, stdout=fh)` – tripleee May 13 '21 at 11:40
  • And again, repeat emphatically, you don't want or need `shell=True` here. See [Actual meaning of `shell=True` in `subprocess`](https://stackoverflow.com/questions/3172470/actual-meaning-of-shell-true-in-subprocess) – tripleee May 13 '21 at 11:44
  • Ok. Thanks. The command includes an output file path for the blastn program to write to (its mandatory). When the program runs, it writes periodically to that file. When I ran the script with the command you suggested, but with a closely related program called blastx, the script waited for the blastx program to finish as intended, yet no output was written to the output file, so one option was that the subprocess command was the reason. After checking more options, I found out it's not the case. – Ziv May 19 '21 at 07:52
  • 1
    You would get that if you did not pass a _list_ as the first argument, or if the first token in that list is not the name of a command in your `PATH`. – tripleee May 19 '21 at 12:15