I have a "source.fasta" file that contains information in the following format:
>TRINITY_DN80_c0_g1_i1 len=723 path=[700:0-350 1417:351-368 1045:369-722] [-1, 700, 1417, 1045, -2]
CGTGGATAACACATAAGTCACTGTAATTTAAAAACTGTAGGACTTAGATCTCCTTTCTATATTTTTCTGATAACATATGGAACCCTGCCGATCATCCGATTTGTAATATACTTAACTGCTGGATAACTAGCCAAAAGTCATCAGGTTATTATATTCAATAAAATGTAACTTGCCGTAAGTAACAGAGGTCATATGTTCCTGTTCGTCACTCTGTAGTTACAAATTATGACACGTGTGCGCTG
>TRINITY_DN83_c0_g1_i1 len=371 path=[1:0-173 152:174-370] [-1, 1, 152, -2]
GTTGTAAACTTGTATACAATTGGGTTCATTAAAGTGTGCACATTATTTCATAGTTGATTTGATTATTCCGAGTGACCTATTTCGTCACTCGATGTTTAAAGAAATTGCTAGTGTGACCCCAATTGCGTCAGACCAAAGATTGAATCTAGACATTAATTTCCTTTTGTATTTGTATCGAGTAAGTTTACAGTCGTAAATAAAGAATCTGCCTTGAACAAACCTTATTCCTTTTTATTCTAAAAGAGGCCTTTGCGTAGTAGTAACATAGTACAAATTGGTTTATTTAACGATTTATAAACGATATCCTTCCTACAGTCGGGTGAAAAGAAAGTATTCGAAATTAGAATGGTTCCTCATATTACACGTTGCTG
>TRINITY_DN83_c0_g1_i2 len=218 path=[1:0-173 741:174-217] [-1, 1, 741, -2]
GTTGTAAACTTGTATACAATTGGGTTCATTAAAGTGTGCACATTATTTCATAGTTGATTTGATTATTCCGAGTGACCTATTTCGTCACTCGATGTTTAAAGAAATTGCTAGTGTGACCCCAATTGCGTCAGACCAAAGATTGAATCTAGACATTAATTTCCTTTTGTATTTGTACCGAGTAAGTTTCCAGTCGTAAATAAAGAATCTGCCAGATCGGA
>TRINITY_DN99_c0_g1_i1 len=326 path=[1:0-242 221:243-243 652:244-267 246:268-325] [-1, 1, 221, 652, 246, -2]
ATCGGTACTATCATGTCATATATCTAGAAATAATACCTACGAATGTTATAAGAATTTCATAACATGATATAACGATCATACATCATGGCCTTTCGAAGAAAATGGCGCATTTACGTTTAATAATTCCGCGAAAGTCAAGGCAAATACAGACCTAATGCGAAATTGAAAAGAAAATCCGAATATCAGAAACAGAACCCAGAACCAATATGCTCAGCAGTTGCTTTGTAGCCAATAAACTCAACTAGAAATTGCTTATCTTTTATGTAACGCCATAAAACGTAATACCGATAACAGACTAAGCACACATATGTAAATTACCTGCTAAA
>TRINITY_DN90_c0_g1_i1 len=1240 path=[1970:0-527 753:528-1239] [-1, 1970, 753, -2]
GTCGATACTAGACAACGAATAATTGTGTCTATTTTTAAAAATAATTCCTTTTGTAAGCAGATTTTTTTTTTCATGCATGTTTCGAGTAAATTGGATTACGCATTCCACGTAACATCGTAAATGTAACCACATTGTTGTAACATACGGTATTTTTTCTGACAACGGACTCGATTGTAAGCAACTTTGTAACATTATAATCCTATGAGTATGACATTCTTAATAATAGCAACAGGGATAAAAATAAAACTACATTGTTTCATTCAACTCGTAAGTGTTTATTTAAAATTATTATTAAACACTATTGTAATAAAGTTTATATTCCTTTGTCAGTGGTAGACACATAAACAGTTTTCGAGTTCACTGTCG
>TRINITY_DN84_c0_g1_i1 len=301 path=[1:0-220 358:221-300] [-1, 1, 358, -2]
ACTATTATGTAGTACCTACATTAGAAACAACTGACCCAAGACAGGAGAAGTCATTGGATGATTTTCCCCATTAAAAAAAGACAACCTTTTAAGTAAGCATACTCCAAATTAAGGTTTAATTAGCTAAGTGAGCGCGAAAAATGATCAAATATACCGACGTCCATTTGGGGCCTATCCTTTTTAGTGTTCCTAATTGAAATCCTCACGTATACAGCTAGTCACTTTTAAATCTTATAAACATGTGATCCGTCTGCTCATTTGGACGTTACTGCCCAAAGTTGGTACATGTTTCGTACTCACG
>TRINITY_DN84_c0_g1_i2 len=301 path=[1:0-220 199:221-300] [-1, 1, 199, -2]
ACTATTATGTAGTACCTACATTAGAAACAACTGACCCAAGACAGGAGAAGTCATTGGATGATTTTCCCCATTAAAAAAAGACAACCTTTTAAGTAAGCATACTCCAAATTAAGGTTTAATTAGCTAAGTGAGCGCGAAAAATGATCAAATATACCGACGTCCATTTGGGGCCTATCCTTTTTAGTGTTCCTAATTGAAATCCTCACGTATACAGCTAGTCAGCTAACCAAAGATAAGTGTCTTGGCTTGGTATCTACAGATCTCTTTTCGTAATTTCGTGAGTACGAAACATGTACCAACT
>TRINITY_DN72_c0_g1_i1 len=434 path=[412:0-247 847:248-271 661:272-433] [-1, 412, 847, 661, -2]
GTTAATTTAGTGGGAAGTATGTGTTAAAATTAGTAAATTAGGTGTTGGTGTGTTTTTAATATGAATCCGGAAGTGTTTTGTTAGGTTACAAGGGTACGGAATTGTAATAATAGAAATCGGTATCCTTGAGACCAATGTTATCGCATTCGATGCAAGAATAGATTGGGAAATAGTCCGGTTATCAATTACTTAAAGATTTCTATCTTGAAAACTATTTCTAATTGGTAAAAAAACTTATTTAGAATCACCCATAGTTGGAAGTTTAAGATTTGAGACATCTTAAATTTTTGGTAGGTAATTTTAAGATTCTATCGTAGTTAGTACCTTTCGTTCTTCTTATTTTATTTGTAAAATATATTACATTTAGTACGAGTATTGTATTTCCAATATTCAGTCTAATTAGAATTGCAAAATTACTGAACACTCAATCATAA
>TRINITY_DN75_c0_g1_i1 len=478 path=[456:0-477] [-1, 456, -2]
CGAGCACATCAGGCCAGGGTTCCCCAAGTGCTCGAGTTTCGTAACCAAACAACCATCTTCTGGTCCGACCACCAGTCACATGATCAGCTGTGGCGCTCAGTATACGAGCACAGATTGCAACAGCCACCAAATGAGAGAGGAAAGTCATCCACATTGCCATGAAATCTGCGAAAGAGCGTAAATTGCGAGTAGCATGACCGCAGGTACGGCGCAGTAGCTGGAGTTGGCAGCGGCTAGGGGTGCCAGGAGGAGTGCTCCAAGGGTCCATCGTGCTCCACATGCCTCCCCGCCGCTGAACGCGCTCAGAGCCTTGCTCATCTTGCTACGCTCGCTCCGTTCAGTCATCTTCGTGTCTCATCGTCGCAGCGCGTAGTATTTACG
There are close to 400,000 sequences in this file.
I have another file ids.txt in the following format:
>TRINITY_DN14840_c10_g1_i1
>TRINITY_DN8506_c0_g1_i1
>TRINITY_DN12276_c0_g2_i1
>TRINITY_DN15434_c5_g3_i1
>TRINITY_DN9323_c8_g3_i5
>TRINITY_DN11957_c1_g7_i1
>TRINITY_DN15373_c1_g1_i1
>TRINITY_DN22913_c0_g1_i1
>TRINITY_DN13029_c4_g5_i1
I have 100 sequence ids in this file. When I match these ids to the source file I want an output that gives me the match for each of these ids with the entire sequence.
For example, for an id:
>TRINITY_DN80_c0_g1_i1
I want my output to be:
>TRINITY_DN80_c0_g1_i1
CGTGGATAACACATAAGTCACTGTAATTTAAAAACTGTAGGACTTAGATCTCCTTTCTATATTTTTCTGATAACATATGGAACCCTGCCGATCATCCGATTTGTAATATACTTAACTGCTGGATAACTAGCCAAAAGTCATCAGGTTATTATATTCAATAAAATGTAACTTGCCGTAAGTAACAGAGGTCATATGTTCCTGTTCGTCACTCTGTAGTTACAAATTATGACACGTGTGCGCTG
I want all hundred sequences in this format. I used this code:
while read p; do
echo ''$p >> out.fasta
grep -A 400000 -w $p source.fasta | sed -n -e '1,/>/ {/>/ !{'p''}} >> out.fasta
done < ids.txt
But my output is different in that only the last id has a sequence and the rest dont have any sequence associated:
>TRINITY_DN14840_c10_g1_i1
>TRINITY_DN8506_c0_g1_i1
>TRINITY_DN12276_c0_g2_i1
....
>TRINITY_DN10309_c6_g3_i1
>TRINITY_DN6990_c0_g1_i1
TTTTTTTTTTTTTGTGGAAAAACATTGATTTTATTGAATTGTAAACTTAAAATTAGATTGGCTGCACATCTTAGATTTTGTTGAAAGCAGCAATATCAACAGACTGGACGAAGTCTTCGAATTCCTGGATTTTTTCAGTCAAGAGATCAACAGACACTTTGTCGTCTTCAATGACACACATGATCTGCAGTTTGTTGATACCATATCCAACAGGTACAAGTTTGGAAGCTCCCCAGAGGAGACCATCCATTTCGATGGTGCGGACCTGGTTTTCCATTTCTTTCATGTCTGTTTCATCATCCCATGGCTTGACGTCAAGGATTATAGATGATTTAGCAATGAGAGCAGGTTTCTTCGATTTTTTGTCAGCATAAGCTTTCAGACGTTCTTCACGAATTCTGGCGGCCTCTGCATCCTCTTCCTCGTCGCCAGATCCGAATAGGTCGACGTCATCATCGTCGTCATCCTTAGCAGCGGGTGCAGGTGCTGTGGTGGTCTTTCCGCCAGCGGTCAGAGGGCTAGCTCCAGCCGCCCAGGATTTGCGCTCCTCGGCATTGTAGGAGGCAATCTGGTTGTACCACCGGAGAGCGTGGGGCAAGCTTGCGCTCGGGGCCTTGCCGACTTGTTGGAACACTTGGAAATCGGCTTGAGTTGGTGTGTAACCTGACACATAACTCTTATCAGCTAAGAAATTGTTAAGCTCATTAAGGCCTTGTGCGGTTTTAACGTCTCCTACTGCCATTTTTATTTAAAAAAGTAGTTTTTTTCGAGTAATAGCCACACGCCCCGGCACAATGTGAGCAAGAAGGAATGAAAAAGAAATCTGACATTGACATTGCCATGAAATTGACTTTCAAAGAACGAATGAATTGAACTAATTTGAACGG
I am only getting the desired output for the 100th id from my ids.txt. Could someone help me on where my script is wrong. I would like to get all 100 sequences when i run the script. Thank you
I have added google drive links to the files i am working with: ids.txt