After searching this website for hours and trying many different things that didn't work, I have decided to post my own question. I currently have a text file (id.txt) that contains about 100 lines of the following IDS in this form:
5377-P3-D5-MSITS2a_R1reads1_1125821
5377-P3-D5-MSITS2a_R1reads1_1126992
I have a 7 GB fasta file with entries in the form
>5377-P3-D5-MSITS2a_R1reads1_1125821 M00532:203:000000000-BKM3D:1:1101:10654:16493 1:N:0:213 orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
AAGTCGTAACAAGGTCTCCGTAGGTGAACCTGCGGAGGGATCATTACACAAATATGAAGGCGGGCTGGAACCTCTCGGGGTTACAGCCTTGCTGAATTATTCACCCTTGTCTTTTGCGTACTTCTTGTTTCCTTGGTGGGTTCGCCCACCACTAGGACAAACATAAACCTTTTGTATTGGCA
>5377-P3-D5-MSITS2a_R1reads1_1126992 M00532:203:000000000-BKM3D:1:1104:27124:5463 1:N:0:213 orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
AAGTCGTAACAAGGTCTCCGTAGGTGAACCTGCGGAGGGATCATTACACAAATATGAAGGCGGGCTGGACCCTCTCGGGGTTACAGCCTTGCTGAATTATTCACCCTTGTCTTTTGCGTACATCTTGTTTCCTTTGTTGTTTCTCCCACCCCTAGGACAAACATAAACCTTTAGTAATTTCAATCAGCGT
>5377-P3-D5-MSITS2a_R1reads1_1129826 M00532:203:000000000-BKM3D:1:1110:14480:9405 1:N:0:213 orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
AAGTCGTAACAAGGTCTCCGTAGGTGAACCTGCGGAGGGATCATTACACAAATATGAAGGCGGGCTGGAAACTCTCGAGGTTACAGCCTTGCTGAATTATTAACCCTTGTCGTTCGCGTACTTCTTGTTTCCTTGGTGTGTTCGCCCACCACAAGTAAAAACATAAACCTTTTGTAA
All IDs from id.text can be found in seq.fasta. The expected output would find the matching ID number in the fasta file from the id.text file and produce:
>5377-P3-D5-MSITS2a_R1reads1_1125821 M00532:203:000000000-BKM3D:1:1101:10654:16493 1:N:0:213 orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
AAGTCGTAACAAGGTCTCCGTAGGTGAACCTGCGGAGGGATCATTACACAAATATGAAGGCGGGCTGGAACCTCTCGGGGTTACAGCCTTGCTGAATTATTCACCCTTGTCTTTTGCGTACTTCTTGTTTCCTTGGTGGGTTCGCCCACCACTAGGACAAACATAAACCTTTTGTATTGGCA
>5377-P3-D5-MSITS2a_R1reads1_1126992 M00532:203:000000000-BKM3D:1:1104:27124:5463 1:N:0:213 orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0
AAGTCGTAACAAGGTCTCCGTAGGTGAACCTGCGGAGGGATCATTACACAAATATGAAGGCGGGCTGGACCCTCTCGGGGTTACAGCCTTGCTGAATTATTCACCCTTGTCTTTTGCGTACATCTTGTTTCCTTTGTTGTTTCTCCCACCCCTAGGACAAACATAAACCTTTAGTAATTTCAATCAGCGT
Currently, I am able to extract one sequence from the fasta file at a time using grep in bash by just copying and pasting one ID from the file.
Ex: grep 5377-P3-D5-MSITS2a_R1reads1_1126992 seq.fasta -A 1
Result:
>5377-P3-D5-MSITS2a_R1reads1_1126992 M00532:203:000000000-BKM3D:1:1104:27124:5463 1:N:0:213 orig_bc=AAAAAAAAAAAA new_bc=AAAAAAAAAAAA bc_diffs=0 AAGTCGTAACAAGGTCTCCGTAGGTGAACCTGCGGAGGGATCATTACACAAATATGAAGGCGGGCTGGACCCTCTCGGGGTTACAGCCTTGCTGAATTATTCACCCTTGTCTTTTGCGTACATCTTGTTTCCTTTGTTGTTTCTCCCACCCCTAGGACAAACATAAACCTTTAGTAATTTCAATCAGCGT
However, I have multiple text files, each containing anywhere from 50-300 IDs, that I would like to use to extract sequences from the FASTA file, and singularly extracting sequences seems unnecessarily time-consuming. I would like to find a way to find and output the sequences from the fasta file for multiple IDs located in a separate text file. I have mainly experimented with awk and grep commands in bash, based primarily on other answers on this site, and almost every command I try produces no result and no error message.
Examples I have tried:
awk -F '>' 'NR==FNR{ids[$0]; next} NF>1{f=($2 in ids)}f' id.txt seq.fasta
awk 'NR==FNR{ids[$0];next} /^>/{f=($1 in ids)} f' id.txt seq.fasta
grep -Fwf id.txt seq.fasta
grep -Ff id.txt seq.fasta
I feel like I have tried many variations of these two commands (based on other stack overflow and biostar suggestions) and in bash, nothing happens, no result or no error message. I am a relative beginner at coding as well, so I cannot pinpoint exactly what is going wrong. I am also open to any python or other code that could be used as well. Any help or advice would be appreciated. Thanks!