I am new to python and wanted to try it to extract text between the matching pattern in each line of my tab delimited text file (mydata)
mydata.txt:
Sequence tRNA Bounds tRNA Anti Intron Bounds Cove
Name tRNA # Begin End Type Codon Begin End Score
-------- ------ ---- ------ ---- ----- ----- ---- ------
lcl|NC_035155.1_gene_75[locus_tag=SS1G_20133][db_xref=GeneID:33 1 1 71 Pseudo ??? 0 0 -1
lcl|NC_035155.1_gene_73[locus_tag=SS1G_20131][db_xref=GeneID:33 1 1 73 Pseudo ??? 0 0 -1
lcl|NC_035155.1_gene_72[locus_tag=SS1G_20130][db_xref=GeneID:33 1 1 71 Pseudo ??? 0 0 -1
lcl|NC_035155.1_gene_71[locus_tag=SS1G_20129][db_xref=GeneID:33 1 1 72 Pseudo ??? 0 0 -1
lcl|NC_035155.1_gene_62[locus_tag=SS1G_20127][db_xref=GeneID:33 1 1 71 Pseudo ??? 0 0 -1
Code I tried:
lines = [] #Declare an empty list named "lines"
with open('/media/owner/c3c5fbb4-73f6-45dc-a475-988ad914056e/phasing/trna/test.txt') as input_data:
# Skips text before the beginning of the interesting block:
for line in input_data:
# print(line)
if line.strip() == "locus_tag=": # Or whatever test is needed
break
# Reads text until the end of the block:
for line in input_data: # This keeps reading the file
if line.strip() == "][db":
break
print(line) # Line is extracted (or block_of_lines.append(line), etc.)
I want to grab texts between [locus_tag=
and ][db_xre
and get these as my results:
SS1G_20133
SS1G_20131
SS1G_20130
SS1G_20129
SS1G_20127