2

I have a text file called file.txt that has below entries :-

healthy
healthy
healthy
healthy
healthy
unhealthy
initial
healthy
initial
healthy

Now i do a count of the number of healthy , initial and unhealthy in this file using below command :-

grep -c healthy file.txt
grep -c unhealthy file.txt
grep -c initial file.txt

Now i want a loop condition in shell script that does this for me :-

while [ $(grep -c "healthy" file.txt) -lt 6 -a $(grep -c "unhealthy" file.txt) != 0 -a $(grep -c "initial" file.txt) != 0 ]
do
bla bla bla
done

Basically all i am trying to do is that for this dynamic file whose entries will keep changing as part of some other script, i want a loop to happen as long as count of healthy in the file is less than equal to 6 and also count of unhealthy is anything above 0 and also count of initial is anything above 0, then do something else exit out of the loop. I am not getting the syntax right. Any help here would be greatly appreciated.

Mat
  • 202,337
  • 40
  • 393
  • 406
Ashley
  • 1,447
  • 3
  • 26
  • 52
  • 2
    The first problem is that `grep -c healthy` will also match "unhealthy". You should use `grep -c '\bhealthy'` to prevent that. – joanis Aug 01 '19 at 20:03
  • Thanks joanis. But looks like my syntax is also not working for other conditions. do you mind providing the script that should eventually work – Ashley Aug 01 '19 at 20:04
  • Also, ca you specify in which cases it's not doing what you want, and quote error messages you get, if any? – joanis Aug 01 '19 at 20:04
  • When I test your `while` loop, having fixed the problem I pointed out above, it works exactly as you describe. Note that your sample file has 7 "healthy" so the condition `healthy < 6` is false right off the bat. – joanis Aug 01 '19 at 20:09
  • so if i try say $(grep -c "healthy" file.txt) -gt 6 itself doesn't evaluate if i just execute it independently. so if i just do a while [ $(grep -c "healthy" file.txt) -gt 6 ] then it should enter the loop since its greater than 6 but it doesn't. though the condition i put in the question is what i actually want to evaluate, but playing around this basic one didn't work. – Ashley Aug 01 '19 at 20:10
  • What version of bash are you using? – joanis Aug 01 '19 at 20:11
  • The shorter example in your comment works for me, just like the original. Also, what's your platform? – joanis Aug 01 '19 at 20:13
  • Ah my bad! i tested it carefully now and it works for all the use cases. Also your suggestion about '\bhealthy' really worked. Thanks again! – Ashley Aug 01 '19 at 20:31
  • All good, glad I could help. – joanis Aug 01 '19 at 20:33
  • Note that `-a` and `-o` in test are flagged obsolescent in current versions of the POSIX `test` standard; search for the `OB` markers in https://pubs.opengroup.org/onlinepubs/9699919799/utilities/test.html. Instead of using them, run `[ ... ] && [ ... ]` to combine two tests. – Charles Duffy Aug 02 '19 at 00:29
  • @Ashley If our answers were helpful, are you willing to upvote and/or accept accept any of them? – joanis Aug 02 '19 at 23:56

3 Answers3

2

The short answer

After a discussion in the comments above, OP and I established that the only real problem in the proposed loop was that grep -c healthy would also match unhealthy, but otherwise the loop already works as intended.

\b should be used to indicate word boundary, as in grep -c '\bhealthy', making the loop:

while [ $(grep -c '\bhealthy' file.txt) -lt 6 -a $(grep -c "unhealthy" file.txt) != 0 -a $(grep -c "initial" file.txt) != 0 ]
do
   bla bla bla
done

EDIT: As @IanW pointed out in the comments, you can also use grep -c -w word instead of adding \b, which will be like adding \b before and after each word.

Making it future proof

It is also worth repeating @CharlesDuffy's recommendation above to avoid -a and -o since they are flagged obsolescent, preferring [ ... ] && [ ... ] instead. This is a good choice for long-term stable code.

So now the loop would look like this:

while [ $(grep -c '\bhealthy' file.txt) -lt 6 ] && [ $(grep -c "unhealthy" file.txt) != 0 ] && [ $(grep -c "initial" file.txt) != 0 ]
do
   bla bla bla
done

Or making it bash specific

And finally I want to note that if this is going to be executed specifically in bash and not sh, [[ ... ]] is faster because it is interpreted by bash itself rather than calling the program test, which [ is an alias for. [[ ... ]] is my personal preference, but unlike POSIX standard commands, it could break in the future and is not compatible with all shells. But it supports a syntax I find nicer and is often simpler to use, not requiring quoting variables all the time, in particular. See double vs single square brackets in bash for an interesting discussion on the topic.

So my own preferred format would be:

while [[ $(grep -c '\bhealthy' file.txt) -lt 6 && $(grep -c "unhealthy" file.txt) != 0 && $(grep -c "initial" file.txt) != 0 ]]
do
   bla bla bla
done
joanis
  • 10,635
  • 14
  • 30
  • 40
  • why choose `grep -c '\bword'` vs `grep -c -w 'word'` ? – Ian W Aug 03 '19 at 09:26
  • @IanW That's a good idea. I could give you a boring answer like "it's not POSIX compliant, while `\b` is", but that would be dishonest because I don't usually care about POSIX compliance... – joanis Aug 03 '19 at 14:45
  • very well then. However, now that I read the complete man page, why/when should the OP use grep -c '\bword' vs grep -c '\wword' ? Not being difficult, just wanting to understand nuances! It's a v. good answer. – Ian W Aug 03 '19 at 21:23
  • @IanW Doesn't `\w` match an actual letter? It's not a boundary marker, as far as I know. Glad you like my answer. – joanis Aug 03 '19 at 22:02
  • 1
    wrt `"it's not POSIX compliant, while \b is"` in the comment above - `\b` isn't POSIX compliant either, see https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html for the POSIX definition of a reguylar expression. – Ed Morton Aug 04 '19 at 03:30
  • 1
    @EdMorton Thank you for that link! When I searched, I found a page that claimed to document a basic POSIX regex library, but it was obviously not strictly POSIX. I had not found the standard itself. So I guess using `-w` is no worse than using `\b`, then, for future-proofness of the solution. I just edited my answer to remove the misleading note about POSIX compliance. – joanis Aug 04 '19 at 15:23
1

You're going about this the wrong way. This should be your starting point:

$ awk '{c[$1]++} END{for (i in c) print i, c[i]}' file
healthy 7
initial 2
unhealthy 1

wrt the conditions you want to act on you can just write them:

$ awk '
    { c[$1]++ }
    END { exit ( (c["healthy"] <= 6) && (c["unhealthy"] > 0) && (c["initial"] > 0) ? 1 : 0 ) }
' file
$ echo $?
0

$ awk '
    { c[$1]++ }
    END { exit ( (c["healthy"] <= 8) && (c["unhealthy"] > 0) && (c["initial"] > 0) ? 1 : 0 ) }
' file
$ echo $?
1

and use them as:

while awk '...' file; do
    your stuff
done

Whatever else you want to do is likewise trivial, efficient, portable, and robust given the above starting point.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
0

How big is your file? If it's very large and you're scanning it with grep three times that might make your script unnecessarily slow.

You can count the matches with one pass through the file using AWK:

read -r u_count h_count i_count <<< <(awk '{arr[$1]++} END {print arr["unhealthy"] arr["healthy"] arr["initial"]}'
while (( u_count < 6 && h_count != 0 && i_count != 0 ))

This will work as long as the data file looks like the example you posted or even if there are other whitespace delimited fields after those words. If those words aren't the first ones on each line, then the AWK script can be modified appropriately.

Unless these counts are changing inside the loop, you might just want to use an if instead of a while.

Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439