0

I want to extract exact nucleotide sequence ("GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA") where ever it occurs in file, using regex in python from a fastq file(where every second line after the line starting with @ is a nucleotide sequence ). this is the code I tried

import re
with open('last_mock.fastq','r') as rf:

    for line in rf:
        x= re.match( r"(GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA)",line)
        if x:
             print(x)

this is the content of file:

       @HWUSI-EAS570R_0003:2:50:5038:17424#0/1
       CAGCTTCTGTTGATGCTGATTTAATTCCTGCAACTA
       +HWUSI-EAS570R_0003:2:50:5038:17424#0/1
       hhhhhhhhhhhgghhhhhahhhhhhhhhhhhgfhh[
       @HWUSI-EAS570R_0003:2:50:5175:17417#0/1
      CACCTTGCTTTATGGGAAAGCGTAACATAACTACAG
      +HWUSI-EAS570R_0003:2:50:5175:17417#0/1
      hhhhhhhhhhhfhhhhfaehhhhgahehhcghhfch
      @HWUSI-EAS570R_0003:2:50:5442:17417#0/1
      AGTTCGCCGACGTTTACGCCGCCTCGGTCCTCGGCA
      +HWUSI-EAS570R_0003:2:50:5442:17417#0/1
     ghhhhhhhhhhhhhhfhhhhhhhfhhgfhhgfgffc
     @HWUSI-EAS570R_0003:2:50:5552:17421#0/1
     AAGACATCAAACTACGAAACTACTACAAGAAAACAT
     +HWUSI-EAS570R_0003:2:50:5552:17421#0/1
    hghghhhhhhhhhghhhhhhghhhhhehhhhheg`h
    @HWUSI-EAS570R_0003:2:50:5658:17415#0/1
    GTTCAAGTGATTCTCCTGCCTCAGCCTCCTGAGTAG
   +HWUSI-EAS570R_0003:2:50:5658:17415#0/1
   hhhhhfhghdhhhhhhhhhhhgghhfheffhdfcbf
   @HWUSI-EAS570R_0003:2:50:5712:17421#0/1
   TTTCTTTTACCCCTAATCCTATCAGCTTTTTCTCCC
  +HWUSI-EAS570R_0003:2:50:5712:17421#0/1
  hhhghhhhhhhhhhhhhhghhhghhhhhghhhghhh
Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
kira.99
  • 41
  • 5

0 Answers0