I want to extract exact nucleotide sequence ("GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA") where ever it occurs in file, using regex in python from a fastq file(where every second line after the line starting with @ is a nucleotide sequence ). this is the code I tried
import re
with open('last_mock.fastq','r') as rf:
for line in rf:
x= re.match( r"(GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA)",line)
if x:
print(x)
this is the content of file:
@HWUSI-EAS570R_0003:2:50:5038:17424#0/1
CAGCTTCTGTTGATGCTGATTTAATTCCTGCAACTA
+HWUSI-EAS570R_0003:2:50:5038:17424#0/1
hhhhhhhhhhhgghhhhhahhhhhhhhhhhhgfhh[
@HWUSI-EAS570R_0003:2:50:5175:17417#0/1
CACCTTGCTTTATGGGAAAGCGTAACATAACTACAG
+HWUSI-EAS570R_0003:2:50:5175:17417#0/1
hhhhhhhhhhhfhhhhfaehhhhgahehhcghhfch
@HWUSI-EAS570R_0003:2:50:5442:17417#0/1
AGTTCGCCGACGTTTACGCCGCCTCGGTCCTCGGCA
+HWUSI-EAS570R_0003:2:50:5442:17417#0/1
ghhhhhhhhhhhhhhfhhhhhhhfhhgfhhgfgffc
@HWUSI-EAS570R_0003:2:50:5552:17421#0/1
AAGACATCAAACTACGAAACTACTACAAGAAAACAT
+HWUSI-EAS570R_0003:2:50:5552:17421#0/1
hghghhhhhhhhhghhhhhhghhhhhehhhhheg`h
@HWUSI-EAS570R_0003:2:50:5658:17415#0/1
GTTCAAGTGATTCTCCTGCCTCAGCCTCCTGAGTAG
+HWUSI-EAS570R_0003:2:50:5658:17415#0/1
hhhhhfhghdhhhhhhhhhhhgghhfheffhdfcbf
@HWUSI-EAS570R_0003:2:50:5712:17421#0/1
TTTCTTTTACCCCTAATCCTATCAGCTTTTTCTCCC
+HWUSI-EAS570R_0003:2:50:5712:17421#0/1
hhhghhhhhhhhhhhhhhghhhghhhhhghhhghhh