In bash (4.3.46(1)) I have some multi-line so called fasta records where each record is initiated by on line with >name and the following lines DNA sequence ([AGCTNacgtn]), here three records:
>chr1
AGCTACTTTT
AGGGNGGTNN
>chr2
TTGNACACCC
TGGGGGAGTA
>chr3
TGACGTGGGT
TCGGGTTTTT
How do I use bash grep to get the second record ? In other languages one might use:
>chr2\n([AGCTNagctn]*\n)*
In Bash I was trying to use the ideas from here (among other SOs). This did not work:
grep -zo '>chr2[AGCTNacgtn]+' file
Result should be:
>chr2
TTGNACACCC
TGGGGGAGTA
SOLUTION
On my system this was the solution (Almost Cyrus' below, i.e. with out the pipe to a second grep .
):
grep -Pzo '>chr1\n[AGCTNacgtn\n]+' file