0

I have a fasta file that does not contain any return characters. The file looks something like this:

>Sequence_ID(Num1)AAAAAAAAAAAAAAAAAAATTTTTTTAAAAA>Seqence_ID(Num2)AAAAAAATTTTTTTAAAATTTAATTTAATTATTAT>Sequence_ID (Num3)AAATTTTATTAGGAGGGA and so on for many lines.

I would have been trying to make a python program that would read this file, and insert a new line character at the end of every sequence ID and sequence itself. I am hoping the output would look like this:

>Sequence_ID(Num1) AAAAAAAAAAAAAAAAAAATTTTTTTAAAAA
>Seqence_ID(Num2) AAAAAAATTTTTTTAAAATTTAATTTAATTATTAT
>Sequence_ID (Num3)AAATTTTATTAGGAGGGA

So far I have this:

input = open('LG_allseqs.txt', 'r')

output = open('LG_Seqs.txt', 'w')

for line in input.readlines():

    if line == '>':
        output.write('\n' + line)
    else:
        output.write(line)

There is no error messages (the syntax is "correct") however I do not generate the particular output I want. Any suggestions would be very much appreciated.

3 Answers3

2

It sounds like you are confusing "lines" with "characters". If everything is on a single line, then read it as a single string (using read, not readlines), and then write out \n-separated lines:

inputtext = input.read()  # 'input' is not a very good name for a variable, btw
output.write('\n'.join('#'+line for line in inputtext.split('>')))
PaulMcG
  • 62,419
  • 16
  • 94
  • 130
0

You are not replacing any characters in your loop. Try the following loop:

for line in input.readlines():
    output.write(line.replace('>', '\n'))

You mentionned in the comment below you wanted to keep the '>' character. Try the following loop to do that:

for line in input.readlines():
    output.write(line.replace('>', '\n>'))
Jerome
  • 1,429
  • 11
  • 13
  • Fantastic, thank you very much. That separated the sequences as I hoped it would. The only problem is that I do not intend to replace the '>' character, I was hoping to keep that character in the beginning of each line. Thank you very much for your help – user2410720 May 22 '13 at 18:20
  • output.write(line.replace('>', '\n>')) – James Thiele May 22 '13 at 18:26
  • don't you mean output.write(line.replace('>', '\n>')) ? – tike Jun 18 '13 at 16:15
  • @flexy The original question did not require the > be kept. That is why there are two code blocks. Maybe I am misunderstanding what you mean? – Jerome Jun 19 '13 at 04:16
  • As far as I see it, the question requiered it to stay in, look at the quoted desired output. The '>' character is actually part of the FASTA fileformat, as far as I know. – tike Jun 19 '13 at 11:58
0

This could be solution for you:

open('LG_Seqs.txt', 'w').write( 
     open('LG_allseqs.txt', 'r').read().replace(">", "\n>") )

and demo of replace:

>>> x = """Sequence_ID(Num1)AAAAAAAAAAAAAAAAAAATTTTTTTAAAAA>Seqence_ID(Num2)AAAAAAATTTTTTTAAAATTTAATTTAATTATTAT>Sequence_ID (Num3)AAATTTTATTAGGAGGGA and so on for many lines."""
>>> print x.replace(">", "\n>")
>Sequence_ID(Num1)AAAAAAAAAAAAAAAAAAATTTTTTTAAAAA
>Seqence_ID(Num2)AAAAAAATTTTTTTAAAATTTAATTTAATTATTAT
>Sequence_ID (Num3)AAATTTTATTAGGAGGGA and so on for many lines.
Robert Lujo
  • 15,383
  • 5
  • 56
  • 73