0

I have a text file:

>name_1  
data_1  
>name_2  
data_2  
>name_3  
data_3  
>name_4    
data_4  
>name_5  
data_5  

I want to store header (name_1, name_2....) in one list and data (data_1, data_2....) in another list in a Python program.

def parse_fasta_file(fasta):
    desc=[]    
    seq=[]    
    seq_strings = fasta.strip().split('>')  
    for s in seq_strings:  
        if len(s):  
            sects = s.split()  
            k = sects[0]  
            v = ''.join(sects[1:])  
    desc.append(k)  
    seq.append(v)    

  for l in sys.stdin:  
  data = open('D:\python\input.txt').read().strip()  
  parse_fasta_file(data)
  print seq   

this is my code which i have tried but i am not able to get the answer.

  • What have you written so far? Where are you getting stuck? – Daniel Timberlake Feb 18 '15 at 06:32
  • Have you even tried writing anything yet?? – biobirdman Feb 18 '15 at 06:32
  • I am new to python and have tried it.. i am able to do for one but not able to do when multiple files are there. – shahbaz khan Feb 18 '15 at 06:34
  • Show us your code and we can show you how to fix it. A blanket "how do I do this" leaves too many things to explain because we can't know what you don't know. – tripleee Feb 18 '15 at 06:47
  • def parse_fasta_file(fasta): seq_strings = fasta.strip().split('>') for s in seq_strings: if len(s): sects = s.split() k = sects[0] v = ''.join(sects[1:]) desc.append(k) seq.append(v) – shahbaz khan Feb 18 '15 at 06:48
  • I lifted that into the question. Please verify that the indentation came out correctly. You can't really post Python code in comments because whitespace is important. – tripleee Feb 18 '15 at 06:51
  • What do you pass in as `fasta`? A list of lines read from the file? – tripleee Feb 18 '15 at 06:51
  • possible duplicate of [parsing a fasta file using a generator ( python )](http://stackoverflow.com/questions/7654971/parsing-a-fasta-file-using-a-generator-python) – tripleee Feb 18 '15 at 06:55
  • See also https://pypi.python.org/pypi/pyfasta/ – tripleee Feb 18 '15 at 06:55
  • Iam passing the fasta file in "fasta". – shahbaz khan Feb 18 '15 at 06:57
  • The file name, an opened file name, or the contents of the file as a list of strings, one per line? – tripleee Feb 18 '15 at 06:58
  • If the indentation is right, you have some trivial indentation errors. But please verify again, and see http://meta.stackexchange.com/questions/22186/how-do-i-format-my-code-blocks – tripleee Feb 18 '15 at 06:59

1 Answers1

1

The most fundamental error is trying to access a variable outside of its scope.

def function (stuff):
    seq = whatever

function('data')
print seq   ############ error

You cannot access seq outside of function. The usual way to do this is to have function return a value, and capture it in a variable within the caller.

def function (stuff):
    seq = whatever
    return seq

s = function('data')
print s

(I have deliberately used different variable names inside the function and outside. Inside function you cannot access s or data, and outside, you cannot access stuff or seq. Incidentally, it would be quite okay, but confusing to a beginner, to use a different variable with the same name seq in the mainline code.)

With that out of the way, we can attempt to write a function which returns a list of sequences and a list of descriptions for them.

def parse_fasta (lines):
    descs = []
    seqs = []
    data = ''
    for line in lines:
        if line.startswith('>'):
            if data:   # have collected a sequence, push to seqs
                seqs.append(data)
                data = ''
            descs.append(line[1:])  # Trim '>' from beginning
        else:
            data += line.rstrip('\r\n')
    # there will be yet one more to push when we run out
    seqs.append(data)
    return descs, seqs

This isn't particularly elegant, but should get you started. A better design would be to return a list of (description, data) tuples where the description and its data are closely coupled together.

descriptions, sequences = parse_fasta(open('file', 'r').read().split('\n'))

The sys.stdin loop in your code does not appear to do anything useful.

tripleee
  • 175,061
  • 34
  • 275
  • 318