3

This is a follow-up of sorts to this question about using NamedTemporaryFile()

I have a function that creates and writes to a temporary file. I then want to use that file in a different function, which calls a terminal command that uses that file (the program is from the Blast+ suite, blastn).

def db_cds_to_fna(collection="genes"):  # collection gets data from mongoDB

    tmp_file = NamedTemporaryFile()
    # write stuff to file

    return tmp_file

def blast_all(blast_db, collection="genes"):        

    tmp_results = NamedTemporaryFile()    
    db_fna = db_cds_to_fna(collection) # should return another file object

    Popen(
        ['blastn',
         '-query', db_fna.name,
         '-db', blast_db,
         '-out', tmp_results.name,
         '-outfmt', '5']  # xml output
    )

    return tmp_results

When I call blast_all, I get an error from the blastn command:

Command line argument error: Argument "query". File is not accessible:  `/var/folders/mv/w3flyjvn7vnbllysvzrf9y480000gn/T/tmpAJVWoz'

But, just prior to the Popen call, if I do os.path.isfile(db_fna.name) it evaluates to True. I can also do

print Popen(['head', db_fna.name]).communicate(0)

And it properly spits out the first lines of the file. So the file exists, and it's readable. Further, I use the same strategy to call a different program from the same blast+ suite (makeblastdb, see question linked at the top) and it works. Is there possibly some problem with permissions? FWIW blastn returns the same error if the file doesn't exist, but it seems clear that I'm correctly creating the file and it's readable when I make the Popen call, so I'm stumped.

Community
  • 1
  • 1
kevbonham
  • 999
  • 7
  • 24

3 Answers3

2

I think the problem may be that the OS has not synced the file to disk. After you write to the file descriptor do:

tmp_file.flush()
os.fsync(tmp_file)

https://docs.python.org/3/library/os.html#os.fsync

maarten
  • 412
  • 5
  • 7
1

I've had a very similar problem at some point. I was searching for ages, thinking I was never going to find the cause.

In my case, the issue was due to file-system latency. I think I ended up putting a dirty hack in place using time.sleep to give the file system some time to create the temp file before starting to access it in the subproces.

Hope that helps!

Kris
  • 22,079
  • 3
  • 30
  • 35
  • But if this were the problem, why does the `Popen(['head', db_fna.name])` run just fine? The temp file is there and written... – kevbonham Feb 11 '16 at 18:18
  • Ha! Your comment reminded me of something... I put a `.wait()` at the end of the `Popen` call, and it worked. The `blastn` function takes a while to run, I bet the function kept running, and returned the results file, at which point the input file went out of scope and was garbage collected/deleted before `blastn` was done with it. – kevbonham Feb 11 '16 at 18:21
  • To be honest, I'm not exactly sure. Have you tried it though? – Kris Feb 11 '16 at 18:22
  • Nice, well I'm happy you didn't need to do a `sleep`! – Kris Feb 11 '16 at 18:23
  • Do you want to amend your answer to include the answer so I can mark it? Otherwise, I'll just write an answer to my own question - but since you triggered the idea, I think you should get the credit. – kevbonham Feb 11 '16 at 18:30
  • Oh, no worries, you write it! – Kris Feb 11 '16 at 19:02
  • Hey, by the way, I just looked back at my old code and I also did it correctly with `.wait()`, as you suggested. – Kris Feb 11 '16 at 19:05
1

I believe I figured out the things conspiring to cause this behavior. First, the Popen() function does not normally wait until the external command finishes before proceeding past it. Second, because as user glibdud mentioned in his answer to my other question, NamedTemporaryFile acts like TemporaryFile in that

It will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected).

Since the end of my blast_all() function does not return the query temp file, it gets closed and garbage collected while the external blastn command is running, so the file is deleted. I'm guessing that the external head command goes so quickly it doesn't encounter this problem, but blastn can take up to a couple of minutes to run.

So the solution is to force Popen() to wait:

Popen(
    ['blastn',
     '-query', db_fna.name,
     '-db', blast_db,
     '-out', tmp_results.name,
     '-outfmt', '5']  # xml output
).wait()
Community
  • 1
  • 1
kevbonham
  • 999
  • 7
  • 24