I have a file with some statistics like this
2023-01-01 01:00:00 TOTAL MEMORY ALLOCATION CONSUMPTION:
2023-01-01 01:00:00 COMPONENT | USAGE (%)
2023-01-01 01:00:00 class.zzz.aaa.bbb | 32
2023-01-01 01:00:00 class.fff.aaa.ggg | 20
2023-01-01 01:00:00 TOTAL: 52% out of 100% allocated memory consumed
2023-01-01 01:00:00 TOTAL MEMORY ALLOCATION CONSUMPTION:
2023-01-02 01:00:00 COMPONENT | USAGE (%)
2023-01-02 01:00:00 class.xxx.aaa.bbb | 42
2023-01-02 01:00:00 class.bbb.aaa.zzz | 10
2023-01-02 01:00:00 class.zzz.xxx | 21
2023-01-02 01:00:00 class.xxx.sss.ggg | 5
2023-01-02 01:00:00 TOTAL: 78% out of 100% allocated memory consumed
2023-01-01 01:00:00 TOTAL MEMORY ALLOCATION CONSUMPTION:
2023-01-03 01:00:00 COMPONENT | USAGE (%)
2023-01-03 01:00:00 class.xxx.yyy.zzz | 10
2023-01-03 01:00:00 class.xxx.zzz.aaa | 20
2023-01-03 01:00:00 class.zzz.aaa.bbb | 30
2023-01-03 01:00:00 TOTAL: 60% out of 100% allocated memory consumed
and I would like to cut out the last set of statistics (in the example above it would be the last 6 lines). As you can see, the amount of lines for each section can change, but the first and the last line stay constant. I was thinking about using:
- "TOTAL" as an anchor point to grab the first and the last line of the wanted block of text
- (?s) mode to match all lines in between those two
I ended up with this regex (?m)^.*?TOTAL(?s).*?(?m)TOTAL.*?$
and to use it in Linux, I used this command to get the wanted output using -P
regex extension for grep (I haven't had much luck with -E
regex extension)
tac con.log | grep -Po "(?m)^.*?TOTAL(?s).*?(?m)TOTAL.*?\$" -m1 | tac
which resulted in this correct output
2023-01-01 01:00:00 TOTAL MEMORY ALLOCATION CONSUMPTION:
2023-01-03 01:00:00 COMPONENT | USAGE (%)
2023-01-03 01:00:00 class.xxx.yyy.zzz | 10
2023-01-03 01:00:00 class.xxx.zzz.aaa | 20
2023-01-03 01:00:00 class.zzz.aaa.bbb | 30
2023-01-03 01:00:00 TOTAL: 60% out of 100% allocated memory consumed
as expected, however this was in my testing environment which uses an old grep version 2.5.3
and when I tried it on my other machine running on Rocky Linux 9, which uses grep version 3.6
I am not getting any match. Considering this regex worked also when testing at regex101.com, I believe this might be a nuance of a newer grep. Is there anything special these newer versions of grep require for a regex like this to work or is there any other way how to get this result (ultimately, it will be used in a bash script)?