0

I am attempting to call a bash script via the subprocess Popen function passes in a for loop. My intent is that with each iteration, a new string commit from an array out is passed as an argument to the Popen command. The command invokes a bash script that outputs a text identified by the variable commit and greps certain lines from that particular text. However, I can't get the output to flush out in the Python for loop. Right now, only the grepped data from the final commit in out is being passed into my final data structure (a pandas dataframe).

accuracy_dictionary = {}
for commit in out:
    accuracy_dictionary.setdefault(commit, {})
    p2 = subprocess.Popen(['~/Desktop/find_accuracies.sh', commit], encoding='utf-8', shell=True, stdout=subprocess.PIPE)
    outputstring = p2.stdout.read()
    # This part below is less critical to the problem at hand
    # I'm putting the data from each file in a dictionary
    for acc_type_line in outputstring.split('\n'):
        accuracy = acc_type_line.split(': ')
        if accuracy != ['']:
            acc_type = accuracy[0]
            value = accuracy[1]
            accuracy_dictionary[commit][acc_type] = float(value)

acc_data = pd.DataFrame.from_dict(accuracy_dictionary).T

Here is the bash script that is being called:

"find_accuracies.sh":

#!/bin/sh

COMMIT=$1
git show $COMMIT:blahblahfolder/blahblah.txt | grep --line-buffered 'accuracy'

acc_data returns a dataframe of nrows=len(out) populated by unique commits, but the value is the exact same for all rows for each acc_type

For example, my output looks like this: enter image description here

How can I call the file "find_accuracies.sh" with the subprocess command and have it flush the unique values of each file for each commit?

Chris
  • 99
  • 1
  • 1
  • 14
  • Doesn't help your problem, but note that you are not calling a `bash` script since the header is `#!/bin/sh`, so you are calling `sh`, not `bash` - yes, there are huge differences, even if `sh` is linked to `bash`. – cdarke Sep 15 '18 at 07:16
  • See https://stackoverflow.com/questions/107705/disable-output-buffering – cdarke Sep 15 '18 at 07:18

1 Answers1

0

I hope this help addressing the immediate problem you're seeing: Here you should really use communicate with subprocess.PIPE as it waits for the command to finish and give give you all of its output:

outputstring = p2.communicate()[0]

You can also use convenient method like check_output to the same effect:

outputstring = subprocess.check_output(['~/Desktop/find_accuracies.sh', commit],
                                       encoding='utf-8', shell=True)

Or also in py3 use run should also do:

p2 = subprocess.run(['~/Desktop/find_accuracies.sh', commit],
                    encoding='utf-8', shell=True, stdout=subprocess.PIPE)
outputstring = p2.stdout

Now few more comments, hints and suggestions:

I am a little surprised it works for you as using shell=True and list of arguments should (see the paragraph starting with "On POSIX with shell=True") make your commit argument of the underlying sh wrapped around your script call and not of the script itself. In any case you can (and I would suggest to) actually drop the shell and leave HOME resolution to python:

from pathlib import Path
executable = Path.home().joinpath('Desktop/find_accuracies.sh')

p2 = subprocess.run([executable, commit],
                    encoding='utf-8', stdout=subprocess.PIPE)
outputstring = p2.stdout

You can (or must for py <3.5) also use os.path.expanduser('~/Desktop/find_accuracies.sh') instead of Path.home() to get script executable. On the other hand for >=3.7 you could replace stdout=subprocess.PIPE with capture_output=True.

And last but not least. It seems a bit unnecessary to call a bash script (esp. double wrapped in sh call like in the original example) just to run git through grep when we already have a python script to process the information. I would actually try to run the corresponding git command directly getting the bulk of its output and process its output in the python script itself to get the bits of interest.

Ondrej K.
  • 8,841
  • 11
  • 24
  • 39