When I do this with awk it's relatively fast, even though it's Row By Agonizing Row (RBAR). I tried to make a quicker more elegant bug resistant solution in Bash that would only have to make far fewer passes through the file. It takes probably 10 seconds to get through the first 1,000 lines with bash using this code. I can make 25 passes through all million lines of file with awk in about the same time! How come bash is several orders of magnitude slower?
while read line
do
FIELD_1=`echo "$line" | cut -f1`
FIELD_2=`echo "$line" | cut -f2`
if [ "$MAIN_REF" == "$FIELD_1" ]; then
#echo "$line"
if [ "$FIELD_2" == "$REF_1" ]; then
((REF_1_COUNT++))
fi
((LINE_COUNT++))
if [ "$LINE_COUNT" == "1000" ]; then
echo $LINE_COUNT;
fi
fi
done < temp/refmatch