Appending filename at the end of certain lines in a text file

Question

I am trying to append a file name at the end of certain lines in many files which I am concatenating.

short example:

INPUTS:

filename (1): 1234_contigs.fasta
>NODE_STUFF
GATTACA

filename (2): 5678_contigs.fasta
>NODE_TUFF
TGTAATC

OUTPUT:

>NODE_STUFF-1234
GATTACA
>NODE_TUFF-5678
TGTAATC

The code that I am using as a scaffold for this was commandeered from another post and my most successful iterations upon it are:

for i in ./*/*contigs.fasta; do sed '/^>NODE.*/ s/$/-(basename $i _contigs.fasta)/' /g $i; done

>NODE_STUFF-(basename $i _contigs.fasta)
GATTACA
>NODE_TUFF-(basename $i _contigs.fasta)
TGTAATC


for i in ./*/*contigs.fasta; do sed s/'^>NODE.*'$/$(basename $i _contigs.fasta)\ /g $i; done
1234 
GATTACA
4568 
TGTAATC

While I see many similar questions I am unable to find a way to do this with only certain lines in these files (which are functionally equivalent to .txt for this example). I believe my confused results are due to errors in handling literals, but after several dozen poorly recorded attempts of pushing quotation marks around I feel more lost than found. Note that each file can contain many lines starting with >NODE which I wish to append the filename too.

RavinderSingh13 · Answer 1 · 2022-05-25T18:48:18.567

3

With your shown samples, please try following awk code. We need not to use a for loop for traversing through all the files, awk is capable in reading all of them by itself. Simple explanation would be, looking for lines which are starting with > if yes then printing current line followed by - followed by current file name's value before _ else(if a line doesn't start from >) printing current line.

awk '/^>/{file=FILENAME;sub(/_.*/,"",file);print $0"-"file;next} 1' *.fasta

OR more precisely:

awk '/^>/{file=FILENAME;sub(/_.*/,"",file);$0=$0"-"file} 1' *.fasta

edited May 25 '22 at 18:48

answered May 25 '22 at 18:30

RavinderSingh13

130,504
14
57
93

1

I think OP want to append the prefixing number of the filename; +1 – Fravadona May 25 '22 at 18:46
@Fravadona, ohh thank you for nice catch, I think I am in sleep a bit :) cheers and thank you. – RavinderSingh13 May 25 '22 at 18:48
for any other new folks interested in discussion on when to use either of these please view these links: [when should I use sed and when should I use awk?](https://stackoverflow.com/questions/14229377/when-should-i-use-sed-and-when-should-i-use-awk), [what are the differences between perl python awk and sed](https://stackoverflow.com/questions/366980/what-are-the-differences-between-perl-python-awk-and-sed) – statlerNwaldorf May 25 '22 at 19:13
1

@RavinderSingh13 for getting the prefixing numbers: `gsub(/^.*\/|_.*$/,"",file)` instead of `sub` – Fravadona May 25 '22 at 19:49
@Fravadona, sure for complete path that make sense, thank you I will edit it on morning or if you want to edit please feel free too, it's too late night here cheers – RavinderSingh13 May 25 '22 at 19:59
1

@statlerNwaldorf For the current use-case the difference is that `sed` isn't capable of doing the job with a single fork. Also, expanding shell variables inside a `sed` statement without escaping them might lead to unwanted behaviors. – Fravadona May 25 '22 at 19:59
fravadona, not disagreeing with you. I thought the first link in particular did a good job of elucidating why awk was a better solution for this, and it helped explain Ravinders approach. I accepted leu answer as my question tags had not been edited yet and it worked off the shelf. – statlerNwaldorf May 25 '22 at 22:04

score 2 · Accepted Answer · answered May 25 '22 at 18:37

2

with bash and sed I'd propose:

for i in ./*/*contigs.fasta; do
   n=$(basename -s _contigs.fasta "$i")
   sed "s/^\(>NODE.*\)/\1-$n/" "$i"
done

answered May 25 '22 at 18:37

leu

2,051
2
12
25

score 1 · Answer 3 · answered May 25 '22 at 18:59

Try

for file in */*_contigs.fasta; do
    filenum=${file%_contigs.fasta}
    filenum=${filenum##*/}

    sed -- "s/^>NODE.*\$/&-${filenum}/" "$file"
done

See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${file%_contigs.fasta} and ${filenum##*/}.

Appending filename at the end of certain lines in a text file

3 Answers3