I have a text file that looks like this:
>long_name
AAC-TGA
>long_name2
CCTGGAA
And a list of column numbers: 2, 4, 7
. Of course I can have these as a variable like:
cols="2 4 7"
I need to replace every column of the rows that don't start with >
with a single character, e.g an N
, to result in:
>long_name
ANCNTGN
>long_name2
CNTNGAN
Additional details - the file has ~200K lines. All lines that don't start with >
are the same length. Line indices will never exceed the length of the non >
lines.
It seems to me that some combination of sed and awk must be able to do this quickly, but I cannot for the life of me figure out how to link it all together.
E.g. I can use sed to work on all lines that don't start with a >
like this (in this case replacing all spaces with N
's):
sed -i.bak '/^[^>]/s/ /N/g' input.txt
And I can use AWK to replace specific columns of lines as I want to like this (I think...):
awk '$2=N'
But I am struggling to stitch this together