Below is an example of some data I'm using. I've read a number of posts involving this topic, as well as tried for a while on regex101.
BotInfo[-]: Source IP:[10.1.1.100] Target Host:[CentOS70-1] Target OS:[CentOS
7.0] Description:[HTTP Connection Request] Details:[10.1.1.101 - - [28/May
/2013:12:24:08 +0000] "GET /math/html.mli HTTP/1.0" 404 3567 "-" "-" ] Phase:
[Access] Service:[WEB]
The goal is to have two capture groups. One for for tag (e.g. Source IP, Target Host, Description, etc) and another for the content contained in the outermost square brackets.
It's the "outermost" that's getting me, because the content for the Details tag has square brackets in it.
Here is my current progress on said regex. I am using the /g flag:
\s?([^:]+):\[(.*?(?=\]\s.*?:\[))\]
This handles everything except the edge case (it's more complex than needed because I've been fiddling with trying to get the edge case to work).
My current lookahead (\]\s.*?:\[
), at a high level, is to match the end left bracket and then the next tag. Another issue is that this fails at the last match, because there is no following tag.
Edit: An example of successful output was requested. Using the data provided, the goal is to have two capture groups resulting in these pairs:
MATCH 1
1. `Source IP`
2. `10.1.1.100`
MATCH 2
1. `Target Host`
2. `CentOS70-1`
MATCH 3
1. `Target OS`
2. `CentOS 7.0`
MATCH 4
1. `Description`
2. `HTTP Connection Request`
MATCH 5
1. `Details`
2. `10.1.1.101 - - [28/May/2013:12:24:08 +0000] "GET /math/html.mli HTTP/1.0" 404 3567 "-" "-" `
MATCH 6
1. `Phase`
2. `Access`
MATCH 7
1. `Service`
2. `WEB`