How can I insert multiple new line characters within a file using python?

Question

I have a fasta file that does not contain any return characters. The file looks something like this:

>Sequence_ID(Num1)AAAAAAAAAAAAAAAAAAATTTTTTTAAAAA>Seqence_ID(Num2)AAAAAAATTTTTTTAAAATTTAATTTAATTATTAT>Sequence_ID (Num3)AAATTTTATTAGGAGGGA and so on for many lines.

I would have been trying to make a python program that would read this file, and insert a new line character at the end of every sequence ID and sequence itself. I am hoping the output would look like this:

>Sequence_ID(Num1) AAAAAAAAAAAAAAAAAAATTTTTTTAAAAA
>Seqence_ID(Num2) AAAAAAATTTTTTTAAAATTTAATTTAATTATTAT
>Sequence_ID (Num3)AAATTTTATTAGGAGGGA

So far I have this:

input = open('LG_allseqs.txt', 'r')

output = open('LG_Seqs.txt', 'w')

for line in input.readlines():

    if line == '>':
        output.write('\n' + line)
    else:
        output.write(line)

There is no error messages (the syntax is "correct") however I do not generate the particular output I want. Any suggestions would be very much appreciated.

str.replace(old,new) will help check this http://stackoverflow.com/questions/9189172/python-string-replace — goutham2027, May 22 '13 at 18:15
How large is the FASTA file? Can the entire file be read into memory? — unutbu, May 22 '13 at 18:16
For the output to be valid FASTA you need `\n` after the ID, as well. — Lev Levitsky, May 22 '13 at 18:19

score 2 · Answer 1 · answered May 22 '13 at 18:17

It sounds like you are confusing "lines" with "characters". If everything is on a single line, then read it as a single string (using read, not readlines), and then write out \n-separated lines:

inputtext = input.read()  # 'input' is not a very good name for a variable, btw
output.write('\n'.join('#'+line for line in inputtext.split('>')))

Jerome · Answer 2 · 2013-05-22T18:26:31.213

0

You are not replacing any characters in your loop. Try the following loop:

for line in input.readlines():
    output.write(line.replace('>', '\n'))

You mentionned in the comment below you wanted to keep the '>' character. Try the following loop to do that:

for line in input.readlines():
    output.write(line.replace('>', '\n>'))

edited May 22 '13 at 18:26

answered May 22 '13 at 18:17

Jerome

1,429
11
13

Fantastic, thank you very much. That separated the sequences as I hoped it would. The only problem is that I do not intend to replace the '>' character, I was hoping to keep that character in the beginning of each line. Thank you very much for your help – user2410720 May 22 '13 at 18:20
output.write(line.replace('>', '\n>')) – James Thiele May 22 '13 at 18:26
don't you mean output.write(line.replace('>', '\n>')) ? – tike Jun 18 '13 at 16:15
@flexy The original question did not require the > be kept. That is why there are two code blocks. Maybe I am misunderstanding what you mean? – Jerome Jun 19 '13 at 04:16
As far as I see it, the question requiered it to stay in, look at the quoted desired output. The '>' character is actually part of the FASTA fileformat, as far as I know. – tike Jun 19 '13 at 11:58

score 0 · Answer 3 · answered May 22 '13 at 18:19

This could be solution for you:

open('LG_Seqs.txt', 'w').write( 
     open('LG_allseqs.txt', 'r').read().replace(">", "\n>") )

and demo of replace:

>>> x = """Sequence_ID(Num1)AAAAAAAAAAAAAAAAAAATTTTTTTAAAAA>Seqence_ID(Num2)AAAAAAATTTTTTTAAAATTTAATTTAATTATTAT>Sequence_ID (Num3)AAATTTTATTAGGAGGGA and so on for many lines."""
>>> print x.replace(">", "\n>")
>Sequence_ID(Num1)AAAAAAAAAAAAAAAAAAATTTTTTTAAAAA
>Seqence_ID(Num2)AAAAAAATTTTTTTAAAATTTAATTTAATTATTAT
>Sequence_ID (Num3)AAATTTTATTAGGAGGGA and so on for many lines.

How can I insert multiple new line characters within a file using python?

3 Answers3