I have a file with a sudden structure, and when the structure is not met I would like to delete those lines. So the structure should be: 1) a line starting with the word "Sequence", 2) a line starting with the word "Start", 3) a line starting with a number.
Now in my file some line do not have the number, but do have the first two lines (the number line was removed with grep). I hope to find a way with awk or sed, to remove the two preceding lines when there is no number line. Hope this is possible?
cat file.txt
Sequence: HM855457_IGHV1-8*02_Homosapiens_F_V-REGION_24..319_296nt_1_____296+0=296__rev-compl_ from: 1 to: 296
Start End Strand Pattern Mismatch Sequence
217 225 + pattern:AA[CT]NNN[AT]CN . aacacctcc
Sequence: MG719312_IGHV1-8*03_Homosapiens_F_V-REGION_127..422_296nt_1_____296+0=296___ from: 1 to: 296
Start End Strand Pattern Mismatch Sequence
217 225 + pattern:AA[CT]NNN[AT]CN . aacacctcc
Sequence: M99648_IGHV2-26*01_Homosapiens_F_V-REGION_164..464_301nt_1_____301+0=301___ from: 1 to: 301
Start End Strand Pattern Mismatch Sequence
Sequence: L21969_IGHV2-70*01_Homosapiens_F_V-REGION_144..444_301nt_1_____301+0=301___ from: 1 to: 301
Start End Strand Pattern Mismatch Sequence
176 184 + pattern:AA[CT]NNN[AT]CN . aatactaca
Sequence: X92241_IGHV2-70*02_Homosapiens_F_V-REGION_144..433_290nt_1_____290+0=290_partialin3'__ from: 1 to: 290
Start End Strand Pattern Mismatch Sequence
176 184 + pattern:AA[CT]NNN[AT]CN . aatactaca
Expected output:
cat file.txt
Sequence: HM855457_IGHV1-8*02_Homosapiens_F_V-REGION_24..319_296nt_1_____296+0=296__rev-compl_ from: 1 to: 296
Start End Strand Pattern Mismatch Sequence
217 225 + pattern:AA[CT]NNN[AT]CN . aacacctcc
Sequence: MG719312_IGHV1-8*03_Homosapiens_F_V-REGION_127..422_296nt_1_____296+0=296___ from: 1 to: 296
Start End Strand Pattern Mismatch Sequence
217 225 + pattern:AA[CT]NNN[AT]CN . aacacctcc
Sequence: L21969_IGHV2-70*01_Homosapiens_F_V-REGION_144..444_301nt_1_____301+0=301___ from: 1 to: 301
Start End Strand Pattern Mismatch Sequence
176 184 + pattern:AA[CT]NNN[AT]CN . aatactaca
Sequence: X92241_IGHV2-70*02_Homosapiens_F_V-REGION_144..433_290nt_1_____290+0=290_partialin3'__ from: 1 to: 290
Start End Strand Pattern Mismatch Sequence
176 184 + pattern:AA[CT]NNN[AT]CN . aatactaca