0

Assume an input table (intable.csv) that contains ID numbers in its second column, and a fresh output table (outlist.csv) into which the input file - extended by one column - is to be written line by line.

echo -ne "foo,NC_045043\nbar,NC_045193\nbaz,n.a.\nqux,NC_045054\n" > intable.csv
echo -n "" > outtable.csv

Further assume that one or more third-party commands (here: esearch, efetch; both part of Entrez Direct) are employed to retrieve additional information for each ID number. This additional info is to form the third column of the output table.

while IFS="" read -r line || [[ -n "$line" ]]
do
    echo -n "$line" >> outtable.csv
    NCNUM=$(echo "$line" | awk -F"," '{print $2}')
    if [[ $NCNUM == NC_* ]]
    then
        echo "$NCNUM"
        RECORD=$(esearch -db nucleotide -query "$NCNUM" | efetch -format gb)
        echo "$RECORD" | grep "^LOCUS" | awk '{print ","$3}' | \
          tr -d "\n" >> outtable.csv
    else
        echo ",n.a." >> outtable.csv
    fi
done < intable.csv

Why does the while loop iterate only over the first input table entry under the above code, whereas it iterates over all input table entries if the code lines starting with RECORD and echo "$RECORD" are commented out? How can I correct this behavior?

Michael Gruenstaeudl
  • 1,609
  • 1
  • 17
  • 31
  • This would happen if `esearch` reads from standard input, since it will read the rest of `intable.csv` – Barmar May 06 '21 at 18:28
  • You don't need the `|| [[ -n "$line" ]]` hack if your input file properly ends with a newline character. – chepner May 06 '21 at 18:34
  • @chepner Okay, noted for future reference. – Michael Gruenstaeudl May 06 '21 at 18:37
  • For your own sake to avoid shooting yourself in the foot at some point please read [correct-bash-and-shell-script-variable-capitalization](https://stackoverflow.com/questions/673055/correct-bash-and-shell-script-variable-capitalization). – Ed Morton Apr 03 '22 at 14:29

1 Answers1

0

This would happen if esearch reads from standard input. It will inherit the input redirection from the while loop, so it will consume the rest of the input file.

The solution is to redirect is standard input elsewhere, e.g. /dev/null.

while IFS="" read -r line || [[ -n "$line" ]]
do
    echo -n "$line" >> outtable.csv
    NCNUM=$(echo "$line" | awk -F"," '{print $2}')
    if [[ $NCNUM == NC_* ]]
    then
        echo "$NCNUM"
        RECORD=$(esearch -db nucleotide -query "$NCNUM" </dev/null | efetch -format gb)
        echo "$RECORD" | grep "^LOCUS" | awk '{print ","$3}' | \
          tr -d "\n" >> outtable.csv
    else
        echo ",n.a." >> outtable.csv
    fi
done < intable.csv
Barmar
  • 741,623
  • 53
  • 500
  • 612