2

We have a process here at work where an input file is created by SAS. That input file is then read by a legacy application and that legacy application creates results. SAS then reads the results and summarizes them. A non-programmer usually takes care of these actions one by one. So the person just creates the input file. They know when it is done, and then they run the legacy application, and they know when that is done. Then they run the summary program.

I have a situation where my boss wants to run about 100 variations. I have access to 3 or 4 computers that share a network drive. Here is my plan: Use computer A, I start creating the 100 input files, one by one. Using computer B, I run the legacy program on each input file. I would like to start running the program when input is ready. So if input1 is done being created on computer A, I want to run the legacy application on input1 on computer B while input2 is being created on computer A. I know python best, so I will probably use python to glue all of this together.

Now I know there is a lot of things I could do, but I think this approach is adequate and will allow me to just get the job done for the time being. I don't really have time to design and test a very elegant solution that would take advantage of all the cores on all the machines or use a database to help me synchronize all of this. I appreciate suggestions like this, but I really just want to know if there is a way, in python, to tell if a file on a network drive is open for writing by any application on any computer? If not, I will probably just come up with a dumb way to create an indicator that the job is done - like create a file "doneA" where if it exists, it means the "input1" file is complete. For example. I would add a step to the sas program that creates an indicator file after the input file has been created.

Sorry for the really long explanation, but I just don't want you to waste your time offering alternate solutions that I probably will not be able to implement.

I have read this question and its responses. I don't think I can use anything like lsof b/c these files would be open on different computers.

Community
  • 1
  • 1
oob
  • 1,948
  • 3
  • 26
  • 48

2 Answers2

2

Write output to a temp file. When done writing, close it, then rename it to the name the other program is waiting for. That way, the file only appears when it is ready to be read.

Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
  • I decided to use this method. I haven't had any problems so far. My python code basically busy loops until the file exists and then does some stuff. I could probably get fancier and pass signals from each of the components to each other, but i probably will never re-use this code. Thanks! – oob Nov 13 '10 at 21:26
1

if there is a way, in python, to tell if a file on a network drive is open for writing by any application on any computer?

Not really.

Windows will let you open the file several times and really muck things up.

You have to use some explicit synchronization. Rather than synchronize each of the three steps 100 different ways, my preference is to do the following. Create 100 copies of the three-step dance. Don't worry about synchronization among the steps.

for variant in range(100):
    name= "variant_{0}.bat".format(variant)
    with open(name,"w") as script:
        print( "run some SAS thing", file=script )
        print( "run some legacy thing", file=script )
        print( "run some SAS thing", file=script )
    subprocess.Popen( "start {0}".format(name), shell=True )

I suspect that this will crush the life out of your processor by running all 100 in parallel.

As a practical matter, you probably don't want to actually use subprocess.Popen() in Python. As a practical matter you probably want to create several of these "start variant_x" batch files that can run a few variants in parallel. You can create some kind of master bat file that runs a sequence of processing steps. Each step starts several parallel 3-step variants.

oob
  • 1,948
  • 3
  • 26
  • 48
S.Lott
  • 384,516
  • 81
  • 508
  • 779
  • this is interesting. problem is that i only have the license for the legacy program on one computer which happens to be my personal computer! fun stuff. so i was planning on my computer being computer B. your solution would work though, just not quite as fast as i want, but if i go this route, i will just run over night. – oob Nov 09 '10 at 21:17