0

I tried to create a program to check the genome sequence.

Context:

Biologists use a sequence of letters A, C, T and G to model a genome.
A gene is a substring of a genome that starts after a triplet ATG and ends before a triplet TAG, TAA, or TGA.
Furthermore, the length of a gene string is a multiple of 3 and the gene does not contain any of the triplets ATG, TAG, TAA and TGA.

My desired result is:

>>Enter a genome string:>>TTATGTTTTAAGGATGGGGCGTTAGTT
Output:
>>TTT
>>GGGCGT
>>Enter a genome string:>>TGTGTGTATAT
>>No gene is found

So far I have got:

import re

def findGene(gene):
  pattern = re.compile(r'ATG((?:[ACTG]{3})*?)(?:TAG|TAA|TGA)')
  return pattern.findall(gene)

  findGene('TTATGTTTTAAGGATGGGGCGTTAGTT')

def main():
  geneinput = input("Enter a genome string: ")
  print(findGene(geneinput))


main()

# TTATGTTTTAAGGATGGGGCGTTAGTT

How can I make this code work properly?

Thank you.

A.Alessio
  • 321
  • 2
  • 15
Raj
  • 21
  • 2

1 Answers1

1
import re

def findGene(gene):
    pattern = re.compile(r'ATG((?:[ACTG]{3})*?)(?:TAG|TAA|TGA)')
    return pattern.findall(gene)

findGene('TTATGTTTTAAGGATGGGGCGTTAGTT')

def main():
    geneinput = input("Enter a genome string: ")
    print(findGene(geneinput) or 'No gene is found')


main()

# TTATGTTTTAAGGATGGGGCGTTAGTT
Raj
  • 21
  • 2
  • In case you need to be able to explain why this works, you need to understand the notion of 'truthy' and 'falsy' (sometimes spelled 'falsey'). See https://stackoverflow.com/questions/39983695/what-is-truthy-and-falsy-how-is-it-different-from-true-and-false – jarmod May 14 '20 at 15:18