I have files that look like:
chr1:92092600 G[chr2:164084669[ ENSG00000189195 ENST00000342818 BTBD8 chr2:164084669
chr1:121498879 T[chr9:2781522[ ENSG00000233432 ENST00000425455 AL592494.2 chr9:2781522
chr2:101298260 ]chr3:196435392]A ENSG00000163162 ENST00000295317 RNF149 chr3:196435392
chr2:164084669 ]chr1:92092600]G ENSG00000237844 ENST00000429636 AC016766.1 chr1:92092600
chr9:2781522 ]chr1:121498879]T ENSG00000080608 ENST00000490444 PUM3 chr1:121498879
chr3:196435392 A[chr2:101298260[ ENSG00000163960 ENST00000296328 UBXN7 chr2:101298260
And for every element in column 6 I would like to search column 1, and if present - print the entire line. So expected output for the first 3 elements in column 6 should look like:
chr2:164084669 ]chr1:92092600]G ENSG00000237844 ENST00000429636 AC016766.1 chr1:92092600
chr9:2781522 ]chr1:121498879]T ENSG00000080608 ENST00000490444 PUM3 chr1:121498879
chr3:196435392 A[chr2:101298260[ ENSG00000163960 ENST00000296328 UBXN7 chr2:101298260
So far I have:
import pandas as pd
pd.options.display.max_colwidth = 100
file = open("data.txt", 'r')
chrA =[]
chrB = []
Bgenes = []
for line in file.readlines():
chrA.append(line.split()[0])
chrB.append(line.split()[5])
for pos in chrB:
if pos in chrA:
Bgenes.append(line)