2

I'm running a bash script using some software which follows the basic pattern below.

while read sample; do
    software ${sample} > output.txt
done <samples.txt

For certain samples this message is printed: "The site Pf3D7_02_v3:274217 overlaps with another variant, skipping..."

This message does not stop the software running but makes the results false. Therefore if the message is given I'd like to stop the software and continue the while loop moving onto the next sample. There are lots of samples in samples.txt which is why I can't do this manually. A way of denoting which sample the message is for would also help. As it is I just get many lines of that message with out knowing which loop the message was given for.

Is it possible to help with this?

Fyi the program I'm using is called bcftools consensus. Do let me know if I need to give more information.

Edit: added "> output.txt" - realised I'd stripped it down too much

Edit 2: Here is the full piece of script using a suggestion by chepner below. Sorry it's a bit arduous:

mkfifo p
while IFS= read -r sample; do
    bcftools consensus --fasta-ref $HOME/Pf/MSP2_3D7_I_region_ref_noprimer.fasta --sample ${sample} --missing N $EPHEMERAL/bam/Pf_eph/MSP2_I_PfC_Final/Pf_60_public_Pf3D7_02_v3.final.normalised_bcf.vcf.gz --output ${sample}_MSP2_I_consensus_seq.fasta | tee p &
    grep -q -m 1 "The site Pf3D7_02_v3" p && kill $!
done <$HOME/Pf/Pf_git/BF_Mali_samples.txt
rm p
  • Capture the output of the software in a variable, and depending on the output, stop the process software, and continue while loop. What do you mean by "makes the result false"? The command `software` returns false? – stephanmg Nov 13 '19 at 15:32
  • @stephanmg No sorry I see that's quite unclear - I just mean they results I get are wrong, not that they're false in a boolean sort of way. – Annie Forster Nov 13 '19 at 15:47
  • Okay, I know can envision a solution. Note my previous one was incorrect, thanks to @chepner. – stephanmg Nov 13 '19 at 16:23

2 Answers2

3

I would use a named pipe to grep the output as it is produced.

mkfifo p
while IFS= read -r sample; do
    software "$sample" > p &
    tee < p output.txt | grep -q -m 1 "The site Pf3D7_02_v3:274217" p && kill $!
done < samples.txt
rm p

software will write its output to the named pipe in the background, but block until tee starts reading. tee will read from the pipe and write that data both to your output file and to grep. If grep finds a match, it will exit and cause kill to terminate software (if it has not already terminated).

If your version of grep doesn't support the -m option (it's common, but non-standard), you can use awk instead.

awk '/The site Pf3D7_02:v3:274217/ { exit 1; }' p && kill $!
chepner
  • 497,756
  • 71
  • 530
  • 681
  • Today I rediscovered named pipes. Thanks for the great example @chepner. – stephanmg Nov 13 '19 at 16:30
  • Shouldn't it likely be `>> output.txt`? – Benjamin W. Nov 13 '19 at 17:00
  • Possibly, though the OP is using `>`. I got the impression that any previous output from a single run of `software` was effectively invalidated in the event the special message is encountered, so I actually imagine something like `if grep -qm1 "..." p; then kill $!; else : do something with good output.txt; fi`. – chepner Nov 13 '19 at 17:04
  • @chepner thanks for the suggestion but it hasn't worked, the software was not stopped for any samples where the message is printed and it all ran as before. Just incase I have not given enough information I've put the whole command lines with your suggestion (used for the particular software) in my post. – Annie Forster Nov 14 '19 at 16:33
  • Is it possible that `software` is simply finishing before `grep` has a chance to see and react to the message? – chepner Nov 14 '19 at 16:35
  • Can you try the updated answer? I'm not sure if putting the pipeline in the background interfered with the value of `$!`; this might work better, and if not, there's still the possibility of buffering between `tee` and `grep` that could cause an issue. – chepner Nov 14 '19 at 16:39
1
while read -u3 sample; do
    software ${sample} | 
    tee output.txt |
    { grep -q -m 1 "The site Pf3D7_02_v3:274217" && cat <&3 }
done 3< samples.txt

The input file is redirected on file descriptor 3. The idea is to eat everything from the 3rd file descriptor if the specified text is detected. Because we redirect output to a file, it's easy to tee output.txt and then check grep for the string. If grep is successful, then we cat <&3 eat everything from the input, so the next read -u3 will fail.

Or:

while read sample; do
    if 
        software ${sample} | 
        tee output.txt |
        grep -q -m 1 "The site Pf3D7_02_v3:274217"
    then
        break;
    fi
done < samples.txt

Because the exit status of the pipeline is the command last executed, we can just check if grep returns with success and then break the loop.

KamilCuk
  • 120,984
  • 8
  • 59
  • 111