If I have a file like this:
abc defghaijkb,mnaobpqa
pbqaaa
qrs - a .. b ...
cde
How to extract all the parts which start with a
and end with b
(I choose these chars to simplify the example, they may be replaced with some more complex regex)? This is a desired output:
ab
aijkb
aob
a .. b
(Putting each item at a separate line). Since there's no non-greedy matching (.*?
) in (g)awk, I cannot find how to solve this (eg. using split
).
Note 1: there will be no need to use multiline matching - that is, no newlines allowed between regex1
and regex2
.
Note 2: I don’t want to use sed
, I want to know if this can be done with awk, or bash, or some another command-line tool that processes an input file line-by-line... AWK seems to be a nice solution, but... if only it supported non-greedy .*?
Note 3: I cannot use grep
because I am always getting memory exhausted
error when I deal with huge files.
Note 4. Here is an example of a more complex regex1
and regex2
. What if they can contain non-greedy .*?
? Eg. <a>.*?<b>.*?</b>.*?</a>
.
Update. More complex example:
[a]text1[a]text000[b]text2[/b]text11[/a]c defgh[a]text3[b]text33[/b]text333[/a]...[/a],mnaobpqa
...[b]aa[/b]bb[/a],,,
qa - [a][b][/b][/a] aabbcc ...
cde
Desired output:
[a]text000[b]text2[/b]text11[/a]
[a]text3[b]text33[/b]text333[/a]
[a][b][/b][/a]