I am trying to count the number of line matches in a very LARGE file and store them in variables using only the BASH shell commands.
Currently, i am scanning the results of a very large file twice and using a separate grep statement each time, like so:
$ cat test.txt
first example line one
first example line two
first example line three
second example line one
second example line two
$ FIRST=$( cat test.txt | grep 'first example' | wc --lines ; ) ; ## first run
$ SECOND=$(cat test.txt | grep 'second example' | wc --lines ; ) ; ## second run
and I end up with this:
$ echo $FIRST
3
$ echo $SECOND
2
Hopefully, I want to only scan the large file just once. And I have never used Awk and would rather not use that!
The |tee
option is new to me. It seems that passing the results into two separate grep statements may mean that we only have to scan the large file once.
Ideally, I would also like to be able to do this without having to create any temporary files & subsequently having to remember to delete them.
I have tried multiple ways using something like these below:
FIRST=''; SECOND='';
cat test.txt \
|tee >(FIRST=$( grep 'first example' | wc --lines ;);) \
>(SECOND=$(grep 'second example' | wc --lines ;);) \
>/dev/null ;
and using read
:
FIRST=''; SECOND='';
cat test.txt \
|tee >(grep 'first example' | wc --lines | (read FIRST); ); \
>(grep 'second example' | wc --lines | (read SECOND); ); \
> /dev/null ;
cat test.txt \
| tee <( read FIRST < <(grep 'first example' | wc --lines )) \
<( read SECOND < <(grep 'sedond example' | wc --lines )) \
> /dev/null ;
and with curly brackets:
FIRST=''; SECOND='';
cat test.txt \
|tee >(FIRST={$( grep 'first example' | wc --lines ;)} ) \
>(SECOND={$(grep 'second example' | wc --lines ;)} ) \
>/dev/null ;
but none of these allow me to save the line count into variables FIRST and SECOND.
Is this even possible to do?