0

For this problem I have a separate txt file called DNA.txt which contains this:

>HSGLTH1 Human theta 1-globin gene
CCACTGCACTCACCGCACCCGGCCAATTTTTGTGTTTTTAGTAGAGACTAAATACCATATAGTGAACACCTAAGA
CGGGGGGCCTTGGATCCAGGGCGATTCAGAGGGCCCCGGTCGGAGCTGTCGGAGATTGAGCGCGCGCGGTCCCGG
GATCTCCGACGAGGCCCTGGACCCCCGGGCGGCGAAGCTGCGGCGCGGCGCCCCCTGGAGGCCGCGGGACCCCTG
GCCGGTCCGCGCAGGCGCAGCGGGGTCGCAGGGCGCGGCGGGTTCCAGCGCGGGGATGGCGCTGTCCGCGGAGGA
CCGGGCGCTGGTGCGCGCCCTGTGGAAGAAGCTGGGCAGCAACGTCGGCGTCTACACGACAGAGGCCCTGGAAAG
GTGCGGCAGGCTGGGCGCCCCCGCCCCCAGGGGCCCTCCCTCCCCAAGCCCCCCGGACGCGCCTCACCCACGTTC
CTCTCGCAGGACCTTCCTGGCTTTCCCCGCCACGAAGACCTACTTCTCCCACCTGGACCTGAGCCCCGGCTCCTC
ACAAGTCAGAGCCCACGGCCAGAAGGTGGCGGACGCGCTGAGCCTCGCCGTGGAGCGCCTGGACGACCTACCCCA
CGCGCTGTCCGCGCTGAGCCACCTGCACGCGTGCCAGCTGCGAGTGGACCCGGCCAGCTTCCAGGTGAGCGGCTG
CCGTGCTGGGCCCCTGTCCCCGGGAGGGCCCCGGCGGGGTGGGTGCGGGGGGCGTGCGGGGCGGGTGCAGGCGAG
TGAGCCTTGAGCGCTCGCCGCAGCTCCTGGGCCACTGCCTGCTGGTAACCCTCGCCCGGCACTACCCCGGAGACT
TCAGCCCCGCGCTGCAGGCGTCGCTGGACAAGTTCCTGAGCCACGTTATCTCGGCGCTGGTTTCCGAGTACCGCT
GAACTGTGGGTGGGTGGCCGCGGGATCCCCAGGCGACCTTCCCCGTGTTTGAGTAAAGCCTCTCCCAGGAGCAGC
CTTCTTGCCGTGCTCTCTCGAGGTCAGGACGCGAGAGGAAGGCGC

For this problem, I want to take this file name and return a FASTA data structure in a separate file which I believe is the first line of the dna.txt file and then the dna sequence is a string of letters returned ignoring any white space. I also want to close the file at the end once I'm done using it.

Expected Output:

[’>HSGLTH1 Human theta 1-globin gene’, ’←-
CCACTGCACTCACCGCACCCGGCCAATTTTTGTGTTTTTAGT
5 AGAGACTAAATACCATATAGTGAACACCTAAGACGGGGGGC
6 CTTGGATCCAGGGCGATTCAGAGGGCCCCGGTCGGAGCTGT
7 CGGAGATTGAGCGCGCGCGGTCCCGGGATCTCCGACGAGGC
8 CCTGGACCCCCGGGCGGCGAAGCTGCGGCGCGGCGCCCCCT
9 GGAGGCCGCGGGACCCCTGGCCGGTCCGCGCAGGCGCAGCG
10 GGGTCGCAGGGCGCGGCGGGTTCCAGCGCGGGGATGGCGCT
11 GTCCGCGGAGGACCGGGCGCTGGTGCGCGCCCTGTGGAAGA
12 AGCTGGGCAGCAACGTCGGCGTCTACACGACAGAGGCCCTG
13 GAAAGGTGCGGCAGGCTGGGCGCCCCCGCCCCCAGGGGCCC
14 TCCCTCCCCAAGCCCCCCGGACGCGCCTCACCCACGTTCCTC
15 TCGCAGGACCTTCCTGGCTTTCCCCGCCACGAAGACCTACTT
16 CTCCCACCTGGACCTGAGCCCCGGCTCCTCACAAGTCAGAGC
17 CCACGGCCAGAAGGTGGCGGACGCGCTGAGCCTCGCCGTGG
18 AGCGCCTGGACGACCTACCCCACGCGCTGTCCGCGCTGAGC
19 CACCTGCACGCGTGCCAGCTGCGAGTGGACCCGGCCAGCTT
20 CCAGGTGAGCGGCTGCCGTGCTGGGCCCCTGTCCCCGGGAG
21 GGCCCCGGCGGGGTGGGTGCGGGGGGCGTGCGGGGCGGGT
22 GCAGGCGAGTGAGCCTTGAGCGCTCGCCGCAGCTCCTGGGC
23 CACTGCCTGCTGGTAACCCTCGCCCGGCACTACCCCGGAGAC
24 TTCAGCCCCGCGCTGCAGGCGTCGCTGGACAAGTTCCTGAGC
25 CACGTTATCTCGGCGCTGGTTTCCGAGTACCGCTGAACTGTG
26 GGTGGGTGGCCGCGGGATCCCCAGGCGACCTTCCCCGTGTTTG
27 AGTAAAGCCTCTCCCAGGAGCAGCCTTCTTGCCGTGCTCTCTC
28 GAGGTCAGGACGCGAGAGGAAGGCGC’]

This is what I have so far:

def get_DNA(name):
    with open(DNA.txt,'r') as dna:
        DNA_d = {}
        for line in dna:
            if line ????????
        return DNA_d

To clarify my problem, I am not sure how to have the "HSGLTH1 Human theta 1-globin gene" be a "header" and separate from the remainder of the string from the DNA.txt file. I want it to be returned within my get_DNA(name) function that is in a separate file.

Thank you for your time and kindness!

  • Welcome to Stack Overflow. To be clear: the file contains exactly two lines of text (a short one giving the gene name, and a very long one with the DNA sequence)? And you want to create a list of two strings, each of which is one of the lines of the file? In the expected output that you show, what do the numbers on the left-hand side mean, and where are they supposed to come from? And what is this `←-` thing? – Karl Knechtel Nov 10 '22 at 23:10
  • @KarlKnechtel Thank you Karl for the welcome! The numbers on the left side are the different line breaks, they're not required! And yes, I would like to have one string that is the "header" which contains the 'HSGLTH1 Human theta 1-globin gene’ part, and the remaining is the second part of the string. I believe the ←- is demonstrating a new line or a line break? – starbucksdoubleshot Nov 10 '22 at 23:16
  • I am not sure if I got your question correctly, please clarify the problem. Also, for closing the file, actually you don't have to do so because the `with` block will do so after finishing its execution. – Ambitions Nov 10 '22 at 23:18
  • Then you are simply creating a list of lines of text in the file; this is extremely well covered ground - please see the linked duplicate. – Karl Knechtel Nov 10 '22 at 23:20

0 Answers0