I have debug logs that are GB in size and contain lots of extraneous data. Single log entries can be 1,000,000+ lines long, some parts with indents, some without, there is very little consistency except for the beginning timestamp at the start of each entry. Each new entry starts with a timestamp ^202[0-9]/[0-9]{2}/[0-9]{2} blah blah
so it is easily identifiable but can have many many lines after it that belong to it. I've been using python to locate strings of text then move up find the parent entry they belong to and down to the end of the entry where the next instance of ^202[0-9]/[0-9]{2}/[0-9]{2} blah blah
is located which is unfortunately not nearly performant enough to make this a painless process. I'm now trying to get grep to do the same with regex since grep seems to be in a different universe in terms of speed. Also I run into the issue of python version differences on machines I'm working on (2vs3) it's just a pain.
This is what I have so far for grep and it works in small test cases but not on large files, there are obviously some issues with it performance wise, how can I resolve this? Perhaps there's a good way to do this with awk?
grep -E "(?i)^20[0-9]{2}\/[0-9]{2}\/[0-9]{2}[\s\S]+00:00:00:fc:77:00[\s\S]+?(?=^20[0-9]{2}\/[0-9]{2}\/[0-9]{2}|\Z)"
the key string I'm looking for is 00:00:00:fc:77:00
sample
2022/01/28 17:58:45.408 {Engine-Worker-08} <radiusItem.request-dump> Request packet dump:
Type=1, Ident=160, Len=54, Auth=7D 12 89 48 19 85 00 00 00 00 00 00 12 0C CC 22
...
hundreds of thousands of lines of nonsense that might have my search string in it, with little to no consistency
...
2022/01/28 17:58:45.408 {Engine-Worker-16} <radiusItem.request-dump> Request packet dump:
...
hundreds of thousands of lines of nonsense that might have my search string in it, with little to no consistency
...
2022/01/28 17:58:46.127 {TcpEngine-3} <tcp-service> Accept failure: Invalid Radius/TLS client 1.1.1.1, connection closed
2022/01/28 17:58:48.604 {Engine-Worker-60} [acct:callAcctBkgFlow] <engine.item.setup> Call method ==> acct:readAcctPropFile
...
hundreds of thousands of lines of nonsense that might have my search string in it, with little to no consistency
...
if any of these have my search string in them I want the whole piece between to the time stamps, all the many thousands of lines.