0

I'm trying to write a program that will take a FASTA file and convert it into a 2D list pairing IDs and sequences. To do this, I split the text up into an list of lines and created a 2D list. The 2D list has as many lists in it as there are IDs in the file, and each inner list consists of two empty strings. The program iterates over the list of lines and when it comes to a ID, it concatenates it to the first entry in one of these inner lists. To keep track of which inner list I'm adding to, I initialize the value j to 0, locate the list at index j in the 2D list, and increase the value j by 1. The trouble comes with concatenating. The program somehow keeps track of every previous ID it's encountered and adds them all at once to the next string. j increments correctly, and this code doesn't work: strings_array[j][0] = strings_array[j][0] + lines[i], but this code does strings_array[j][0] = lines[i], but I can't figure out how strings_array[j][0] is saving previous values. Thanks to how the file is structured, I need to save previous values for the sequences, so I want to figure out the problem with this line.

Here is the full code:

def dna_processing(filename):
    f = open(filename, 'r')
    txt = f.read()
    f.close()
    lines = txt.split("\n")


    seq_num = 0

    for i in range(len(lines)):

        if lines[i] == '':
            del lines[i]
        elif lines[i][0] == '>':
            seq_num = seq_num + 1

    strings_list = [["", ""]] * seq_num

    j = 0
    for i in range(len(lines)):
        if lines[i][0] == '>':
            strings_list[j][0] = strings_list[j][0] + lines[i]
            j = j + 1

        #else:
            #strings_list[j-1][1] = strings_list[j-1][1] + lines[i]





dna_processing("rosalind_grph.txt")

And here is an example of the text I am inputting:

>Rosalind_5931
AGAATAGGAAGCGCCGTGTTGAAATATAAGAGCACCCCAGACGTGTACTTTGTGTTGGTC
TCTGGCGACCATTCTGTGCGGT
>Rosalind_7410
GAACCTAAGGTCCATCGTCATAACTGCGACCCTACAAACAGATGGTTTCATGTGAAATAA
GTTAGGAACCAGAAAATCATAGCAGACGTA
>Rosalind_0759
GTTTGCATTAGTTCCTCGGGGTCACTCTCCTAGCTATATTGCATAATAACCAGGTGGCTC
CCGTTATGGCCCAAGACACTTGTTGGTAG
>Rosalind_6944
TACGCCGCCATAACAGGGTCCGAGCCGCAAGGTTGGTCCACCGTACTCCAACCATGGCTA
TCAAACGGTTGCAGAGCCACCGAACTGGGCG
>Rosalind_2801
GCTTTCAGGCTAAACCGACATGGTCCCCAATACTTTTAAGATCGGAGTCAAGGTTAAGAG
TGTGGCGTGTTAGCGGCCCTCA
eclare
  • 139
  • 5
  • What exactly does not work? What do you expect? – puncher Aug 21 '22 at 17:33
  • 1
    I only glanced at this very quickly, so this may be off the mark, but is smells like this: https://stackoverflow.com/questions/240178/list-of-lists-changes-reflected-across-sublists-unexpectedly – Ture Pålsson Aug 21 '22 at 17:34
  • 1
    (By the way, that first `for` loop is going to give you an IndexError sooner or later, because you loop over the length of the list *before* deleting any items.) – Ture Pålsson Aug 21 '22 at 17:37
  • @TurePålsson That's the same question, thank you! – eclare Aug 21 '22 at 17:39
  • Does this answer your question? [List of lists changes reflected across sublists unexpectedly](https://stackoverflow.com/questions/240178/list-of-lists-changes-reflected-across-sublists-unexpectedly) – Ture Pålsson Aug 25 '22 at 16:15

0 Answers0