I am trying to write a sh script that will open a text file, grep each line in a tsv file, then do various things with the output. I'm stuck on the grepping.
strains_list_test:
p1.A2
p1.A3
p1.C5
p1.D11
p1.D2
p8.H2
The sh script so far:
CURRENT_DIR=$(pwd)
STRAINS=$(pwd)/strains_list_test
OUTPUT_DIR=$(pwd)/output
BLAST_FILE=$(pwd)/filtered2.tsv
#this generates fasta files for all strains in input file.
cat $STRAINS | while read strain_name
do
echo "strain name is" $strain_name
grep '$strain_name' $BLAST_FILE | head
done
The output looks like this:
strain name is p1.A2
strain name is p1.A3
strain name is p1.C5
strain name is p1.D11
strain name is p1.D2
strain name is p8.H2
So in each loop, the grep returns nothing. After searching online for answers I have tried '$strain_name', "$strain_name", "/$strain_name", "${strain_name}" and god knows what else, but to no avail.
The thing is, if I leave out the variable and just do grep "p8.H2" $BLAST_FILE | head
(for example), I get the correct output. So at least that part works, but something about the way I use the variable as grep input or maybe the line reading is broken... Even though the echo line still prints the variable correctly.
EDIT: Multiple people have recommended I use double quotes instead. As I said above, I have already tried "$strain_name".
Here's an example TSV file.
ProtName p26.C10|protID 58.744 223 83 2 1 216 1 221 1.95e-59 234 100
ProtName p26.C10|protID 38.000 150 68 1 216 340 72 221 6.37e-14 85.5 100
ProtName p8.H2|protID 34.300 207 100 5 101 278 22 221 1.20e-12 81.3 100
ProtName p23.A4|protID 72.002 1718 453 4 340 2029 72 1789 0.0 2511 100
ProtName p23.A4|protID 58.744 223 83 2 1 216 1 221 1.95e-59 234 100
ProtName p23.A4|protID 38.000 150 68 1 216 340 72 221 6.37e-14 85.5 100
ProtName p23.A4|protID 34.300 207 100 5 101 278 22 221 1.20e-12 81.3 100
Here is the current code in its entirety.
#!/bin/bash
CURRENT_DIR=$(pwd)
STRAINS=$(pwd)/strains_list_test
OUTPUT_DIR=$(pwd)/output
BLAST_FILE=$(pwd)/filtered2_sample.tsv
cat $STRAINS | while read strain_name
do
echo "strain name is" $strain_name
grep "$strain_name" $BLAST_FILE | head
done
echo "testing non-variable grep"
grep "p8.H2" $BLAST_FILE | head
SECOND EDIT:
I have tried running my code with the bash -x script command to provide a detailed log.
Here is what I get in the test echo lines:
echo 'strain name is' $'p8.H2\r'
Maybe the \r is the reason the grep isn't working?. Any ideas on how to fix this?
THIRD EDIT: The grep definitely isn't the issue: I tried this instead:
for strain_name in p1.A2 p1.A3 p1.C5 p1.D11 p8.H2 p1.D2
So it works when I don't read the file and have the strains list directly in the script. I don't really want this for the final version, but this suggests there's something wrong with the way the strains_list_test file is being read. (And no, before you ask, changing the "while read" to "for ... in" alone didn't do it.)
FOURTH EDIT The above code works when I change the strains_list_test file from a column to just p1.A2 p1.A3 p1.C5 p1.D11 p8.H2 p1.D2 So I found a way to do what I wanted to do. However, it's still not clear why the previous version wasn't working.