I wonder what is the best way to remove some lines from a fasta file in bash.
In the example above, let's say I want to remove the line where it's written 'GUITH', how do you remove this line and above lines, until you find some other '>'
character ?
fasta
file:
>B4KSI7_DROMO
RGLKRKPMALIKKLRKAKKEAPPNEKPEIVKTHLRNMIIVPEMTGSIIGVYNGKDFGQVE
VKPEMIGHYLGEFALTYKPVKH
>O46898_GUITH
RSLSKGPYIAAHLLKKLNNVDIQKPDVVIKTWSRSSTILPNMVGATIAVYNGKQHVPVYI
SDQMVGHKLGEFSPTRTFRSH
>Q7RT13_PLAYO
RGIDKKAKSLLKKLRKAKKECEVGEKPKPIPTHLRNMTIIPEMVGSIVAVHNGKQYTNVE
IKPEMIGYYLGEFSITYKHTRH
fasta
file after filtering with bash:
>B4KSI7_DROMO
RGLKRKPMALIKKLRKAKKEAPPNEKPEIVKTHLRNMIIVPEMTGSIIGVYNGKDFGQVE
VKPEMIGHYLGEFALTYKPVKH
>Q7RT13_PLAYO
RGIDKKAKSLLKKLRKAKKECEVGEKPKPIPTHLRNMTIIPEMVGSIVAVHNGKQYTNVE
IKPEMIGYYLGEFSITYKHTRH
There is an other version of the question, but harder manipulation. Let's say you have a file with species names :
species.txt
:
DROMO;
PLAYO;
And you want to delete lines in the fasta file where species are not present in the species.txt document. So you get the same output as above, but you get the lines to erase thanks to some other file (not entering 'GUITH'
directly). What would be the best way of doing that ?