1

I have the following lines in test.fa :

#test.fa
>1
AGAGGGAGCTG
CCTCAGGGCTG
CACTCAGGAAA
TTGGGGCGCTG
AGCATGGGGGG
CAGGAGGGGCC

I need to ignore the lines starting with ">" , and concatenate the following lines into one single string. The following script however not only skips lines with ">" , but also the next line before concatenating remaining.

#!/usr/bin/env python
import sys
import re
string = ""
with open("test.fa","rt") as f:
       for line in f:
           if re.match(">",line):
              line = f.next()
           else:
              line = line.rstrip("\n")
              string = string + line
print (string)

Could anyone help fix the script , or suggest better ways to do it? thanks !!

harsh
  • 79
  • 1
  • 7

3 Answers3

5

The line counter already increments every loop anyway, so you don't actually need to do anything in the if block.

   for line in f:
       if re.match(">",line):
          pass
       else:
          line = line.rstrip("\n")
          string = string + line

Or

   for line in f:
       if not re.match(">",line):
          line = line.rstrip("\n")
          string = string + line

Additional enhancements: you don't need regex to determine what character a string starts with, and accumulating lines in a list is generally recommended over concatenating a string.

lines = []
for line in f:
    if not line.startswith(">"):
        lines.append(line.rstrip("\n"))
string = "".join(lines)

Or, as a one liner:

string = "".join(line.rstrip("\n") for line in f if not line.startswith(">"))
Kevin
  • 74,910
  • 12
  • 133
  • 166
  • thank you everyone ! all the solutions are similar, Thank you Kevin for the additional enhancements. helpful for any beginner. – harsh Oct 05 '15 at 16:49
1

You are essentially calling line.next() twice since each time you loop, it is getting the next line.. I'd recommend going with this

#!/usr/bin/env python
import sys
import re
string = ""
with open("test.fa","rt") as f:
       for line in f:
           if not re.match(">",line):
              line = line.rstrip("\n")
              string = string + line
print (string)
d_kennetz
  • 5,219
  • 5
  • 21
  • 44
CollinD
  • 7,304
  • 2
  • 22
  • 45
0

You don't need the

line = f.next()

that happens automatically in the iterator. Just do this:

#!/usr/bin/env python
import sys
import re

string = ""
with open("test.fa","rt") as f:
    for line in f:
        if not re.match(">",line):
            line = line.rstrip("\n")
            string = string + line
print (string)
HumanCatfood
  • 960
  • 1
  • 7
  • 20