Add words at beginning and end of the same line for the FASTA header line with sed

Question

I have the following line:

>A_1000
ACTTTCGATCTCTTGTAGATCTGTTCTC...CAC
ACTTTCGATCTCTTGTAGATCTGTTCTC...CAC

I would like to convert the first line as follows:

>Initialword/A_1000/Finalword
ACTTTCGATCTCTTGTAGATCTGTTCTC...CAC
ACTTTCGATCTCTTGTAGATCTGTTCTC...CAC

I found a similar question that did allow me to append the end and the beginning as I needed (Add words at beginning and end of a FASTA header line with sed). However, it adds the Finalword to the next line.

I ran the following:

 sed 's%^>(.*)%>Initialword/\1/Finalword%' input.fasta > output.fasta

Which returns:

>Initialword/A_0101M/Finalword 
ACTTTCGATCTCTTGTAGATCTGTTCTC...CACM
ACTTTCGATCTCTTGTAGATCTGTTCTC...CACM

But in the Fasta file it looks like:

>Initialword/A_0101 
/Finalword 
ACTTTCGATCTCTTGTAGATCTGTTCTC...CAC
ACTTTCGATCTCTTGTAGATCTGTTCTC...CAC

How can I fix this to just add the text to the beginning and end of the header? What is the M at the end of each line in the file?

Thank you

Does it __have to__ be sed? Why not use other tools? `Which returns: ... But in the Fasta file` Is "Fasta file" some graphical program? _Where_ does it "look" like that? Does your input file has DOS line endings? — KamilCuk, Aug 26 '21 at 21:39
It doesn't have to be sed. That is just the tool that I started with. Fasta files are those returned from DNA sequencers. It has to be in this format to submit to online repositories. I just needed to change the header name (following >) in order to match the submission guidelines. There are hundreds to thousands of headers in these files. Changing them by hand is prone to error. In terms of Where, the output file in the command line using less appears under ```Which returns:```. However, when I opened the output file in some program like notepad it appears like ```But in the Fasta file```. — Keah Chambers, Aug 27 '21 at 12:40
@KeahChambers if you are interested in FASTA formatting, I suggest you use a text editor like BBEdit, which supports grep find/replace. This lets you create simple regular expressions to incrementally modify headers however you see fit. It doesn't have as much utility as command line sed/grep, but it should be significantly easier to use. Even basic regex substitutions can be very helpful. — Ghoti, Aug 29 '21 at 23:55

Cyrus · Accepted Answer · 2021-08-26T22:25:32.057

1

First convert your file and then use GNU sed:

dos2unix <input.fasta | sed -E 's%^>(.*)%>Initialword/\1/Finalword%' >output.fasta

edited Aug 26 '21 at 22:25

answered Aug 26 '21 at 22:12

Cyrus

1

This resolved the issue. Thank you. I am assuming my file had DOS line endings that I was unaware of. – Keah Chambers Aug 27 '21 at 12:46

1 Answers1