I have consensus sequences of a three segmented virus genome (the three segments are named L, M or S respectively), so inside each genome fasta file I have three fasta file looking like this:
>Toscana_virus_L_(consensus)_(consensus)
TTAACCATTCATCCCCTGAGGAGGTATGAATCATCAATTTATGACACTCCAATACCAGCC
..
>Toscana_virus_M_(consensus)_(consensus)
AATATACTATTATTTCAGAGATAGGGAACGGCACTAGAACTTCCTTTTTAGAAGCTTGGG
..
>Toscana_virus_S_(consensus)_(consensus)
NNACAAAGACCTCCCGTATTGCTAAACCAGAACTAATAATAGACTTCTAGACAGCCATGC
..
I want to change the heading of the fasta file with their proper sample name.
My file sample names look like this:
LCR_1152;
LCR367 , etc
So this is what I did:
cp *.fasta to_rename/
mkdir renamed
cd to_rename
for filename in *.fasta; do filename2=$(echo $filename | sed 's/.*\(LCR_?\).*\([0-9][0-9][0-9][0-9]?$\).*/\1\2/'); awk -v a="$filename2" '/^>/{print ">"a; next}{print}' < $filename > ../renamed/$filename ; done
And it worked well but the problem is that now inside each file, the threee segments have the same heading, I lost the distinction of L, M or S.
For example this is what I get:
>LCR_1152
TTAACCATTCATCCCCTGAGGAGGTATGAATCATCAATTTATGACACTCCAATACCAGCC
..
>LCR_1152
AATATACTATTATTTCAGAGATAGGGAACGGCACTAGAACTTCCTTTTTAGAAGCTTGGG
..
>LCR_1152
NNACAAAGACCTCCCGTATTGCTAAACCAGAACTAATAATAGACTTCTAGACAGCCATGC
..
But what I want is the following ..
>LCR_1152_L
TTAACCATTCATCCCCTGAGGAGGTATGAATCATCAATTTATGACACTCCAATACCAGCC
..
>LCR_1152_M
AATATACTATTATTTCAGAGATAGGGAACGGCACTAGAACTTCCTTTTTAGAAGCTTGGG
..
>LCR_1152_S
NNACAAAGACCTCCCGTATTGCTAAACCAGAACTAATAATAGACTTCTAGACAGCCATGC
..
In order not to lose the identity of the fragments.
I dont know how to solve it, my attempts have been unsuccessful :(
Does anyone know how to work that out?