From this link http://www.gene-regulation.com/cgi-bin/pub/programs/pmatch/bin/p-match.cgi produced result that I need to process in order to obtain only sequence ID, start and end position. What are the ways I can extract coordinate information from the result? Below is example result.
Scanning sequence ID: BEST1_HUMAN
150 (-) 1.000 0.997 GGAAAggccc R05891
354 (+) 0.988 0.981 gtgtAGACAtt R06227
V$CREL_01c-RelV$EVI1_05Evi-1
Scanning sequence ID: 4F2_HUMAN
365 (+) 1.000 1.000 gggacCTACA R05884
789 (-) 1.000 1.000 gcgCGAAA R05828; R05834; R05835; R05838; R05839
V$CREL_01c-RelV$E2F_02E2F
Expected output:
Sequence ID start end
(end site is the number of short sequence GGAAAggccc added to start site).
BEST1_HUMAN 150 160
BEST1_HUMAN 354 365
4F2_HUMAN 365 375
4F2_HUMAN 789 797
Can anyone help me?