I don't think regex is expressive enough to handle this with the length requirement.
However, you can break down this problem by using a window iterator to simulate an open read frame:
# From http://stackoverflow.com/questions/6822725/rolling-or-sliding-window-iterator-in-python:
from itertools import islice
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
sequence = "ATG GTC TGA CGA CGG CAG TAA AAA AAA GGG TGG GCA GCC TTT GAA GCC TTT"
codons = sequence.split()
orf = window(codons, 7)
matching_codons = ['TGA', 'TAA', 'TAG']
[sequence for sequence in orf if any(codon in matching_codons for codon in sequence)]
Dissecting the code
orf = window(codons, 7)
This defines a generator which will return each frame of length 7, moving the frame by 1 each iteration.
Then, the list comprehension iterates does two things.
It iterates over each sequence in our ORF:
[sequence for sequence in orf] # returns all possible frames of length 7 in sequence
It filters the result, only returning sequences that contain any of the valid codons:
[sequence for sequence in orf if any(codon in ['TGA', 'TAA', 'TAG'] for codon in sequence)] # Only matches sequences matching 'TGA', 'TAA', or 'TAG'
Finally, if you want the result to be the substrings themselves, use the following list comprension:
[' '.join(sequence) for sequence in window(codons, 7) if any(codon in ['TGA', 'TAA', 'TAG'] for codon in sequence)]
Result:
['ATG GTC TGA CGA CGG CAG TAA', 'GTC TGA CGA CGG CAG TAA AAA', 'TGA CGA CGG CAG TAA AAA AAA', 'CGA CGG CAG TAA AAA AAA GGG', 'CGG CAG TAA AAA AAA GGG TGG', 'CAG TAA AAA AAA GGG TGG GCA', 'TAA AAA AAA GGG TGG GCA GCC']