1

This is for log events, where I want to match certain fields that might exist in data, but only for events that contain a certain string, in this example that string is type="traffic"

Apparently I'm supposed to update this question instead of asking another one, sorry I couldn't accept the answer to this, someone dinged me on points.

Thanks!

Here are sample events, I want to capture specific fields in any line that includes type="traffic" but not any that have type="utm"

Apr 29 11:08:16 10.150.148.52 date=2023-04-29 time=11:08:17 devname="ABCZWPFWFTG-B" devid="ABCVM8V0000158159" eventtime=1682791697444922720 tz="-0700" logid="1059028704" type="utm" subtype="app-ctrl" eventtype="signature" level="information" vd="root" appid=40568 srcip=10.150.150.10 dstip=20.62.63.153 srcport=55544 dstport=443 srcintf="port2" srcintfrole="lan" dstintf="port1" dstintfrole="wan" proto=6 service="SSL" direction="incoming" policyid=1 sessionid=4976047 applist="PROD-APPCTRL-AZURE" action="pass" appcat="Web.Client" app="HTTPS.BROWSER" hostname="124537f1-b52d-4e77-a6bd-e73c9904ea48.agentsvc.azure-automation.net" incidentserialno=205723388 url="/" msg="Web.Client: HTTPS.BROWSER," apprisk="medium" scertcname="*.azure-automation.net" scertissuer="Microsoft RSA TLS CA 01"
Apr 29 11:08:16 10.150.148.52 date=2023-04-29 time=11:08:17 devname="ABCZWPFWFTG-B" devid="ABCVM8V0000158159" eventtime=1682791697444901220 tz="-0700" logid="1059028704" type="utm" subtype="app-ctrl" eventtype="signature" level="information" vd="root" appid=15895 srcip=10.150.150.10 dstip=20.62.63.153 srcport=55544 dstport=443 srcintf="port2" srcintfrole="lan" dstintf="port1" dstintfrole="wan" proto=6 service="SSL" direction="outgoing" policyid=1 sessionid=4976047 applist="PROD-APPCTRL-AZURE" action="pass" appcat="Network.Service" app="SSL" hostname="124537f1-b52d-4e77-a6bd-e73c9904ea48.agentsvc.azure-automation.net" incidentserialno=205723383 url="/" msg="Network.Service: SSL," apprisk="elevated" scertcname="*.azure-automation.net" scertissuer="Microsoft RSA TLS CA 01"
Apr 29 11:08:16 10.150.148.52 date=2023-04-29 time=11:08:17 devname="ABCZWPFWFTG-B" devid="ABCVM8V0000158159" eventtime=1682791697444603820 tz="-0700" logid="1059028704" type="utm" subtype="app-ctrl" eventtype="signature" level="information" vd="root" appid=40568 srcip=45.42.34.136 dstip=10.150.148.104 srcport=60638 dstport=443 srcintf="port1" srcintfrole="wan" dstintf="port5" dstintfrole="dmz" proto=6 service="SSL" direction="incoming" policyid=10 sessionid=4976049 applist="PROD-APPCTRL-AZURE" action="pass" appcat="Web.Client" app="HTTPS.BROWSER" hostname="www.testdata.com" incidentserialno=205723390 url="/" msg="Web.Client: HTTPS.BROWSER," apprisk="medium" scertcname="www.testdata.com"
May 19 16:32:23 10.150.160.13 date=2023-05-19 time=16:32:25 devname="fw1-test-lv-external" devid="ABC1K5DT918800482" eventtime=1684539145135795404 tz="-0700" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="PRODUCTION" srcip=10.150.161.11 srcport=64507 srcintf="ABCTCORPO3.3052" srcintfrole="lan" dstip=208.11.121.76 dstport=53 dstintf="ABCTINTPO1.3053" dstintfrole="wan" srccountry="Reserved" dstcountry="United States" sessionid=547154413 proto=17 action="accept" policyid=247 policytype="policy" poluuid="07588088-f351-51ec-153c-4a07e49c5818" policyname="Microsoft DNS to Umbrella" service="DNS" trandisp="snat" transip=38.70.139.3 transport=64507 duration=249 sentbyte=169 rcvdbyte=231 sentpkt=2 rcvdpkt=2 appcat="unscanned" srchwvendor="Cisco" devtype="Network" srcfamily="AP" osname="Cisco IOS" mastersrcmac="12:12:12:12:dc:27" srcmac="12:12:12:12:dc:27" srcserver=0
May 19 16:23:57 10.150.160.13 date=2023-05-19 time=16:23:58 devname="fw1-test-lv-external" devid="ABC1K5DT918800482" eventtime=1684538639125610717 tz="-0700" logid="0000000020" type="traffic" subtype="forward" level="notice" vd="PRODUCTION" srcip=10.150.161.11 srcport=63392 srcintf="ABCTCORPO3.3052" srcintfrole="lan" dstip=208.11.121.76 dstport=53 dstintf="ABCTINTPO1.3053" dstintfrole="wan" srccountry="Reserved" dstcountry="United States" sessionid=547134202 proto=17 action="accept" policyid=247 policytype="policy" poluuid="07588088-f351-51ec-153c-4a07e49c5818" policyname="Microsoft DNS to Umbrella" service="DNS" trandisp="snat" transip=38.70.139.3 transport=63392 duration=145 sentbyte=230 rcvdbyte=382 sentpkt=3 rcvdpkt=3 appcat="unscanned" sentdelta=230 rcvddelta=382 srchwvendor="Cisco" devtype="Network" srcfamily="AP" osname="Cisco IOS" mastersrcmac="12:12:12:12:dc:27" srcmac="12:12:12:12:dc:27" srcserver=0
May 19 16:25:35 10.132.119.14 date=2023-05-19 time=16:25:36 devname="FW1-testMAIN-ABCT01" devid="ABCT3KD3Z17800372" eventtime=1684538737153322514 tz="-0700" logid="0000000020" type="traffic" subtype="forward" level="notice" vd="PRODUCTION" srcip=10.151.143.4 srcport=50423 srcintf="port4" srcintfrole="lan" dstip=52.111.145.1 dstport=443 dstintf="port3" dstintfrole="undefined" srccountry="Reserved" dstinetsvc="Microsoft-Office365" dstcountry="United States" dstregion="California" dstcity="San Jose" dstreputation=5 sessionid=3673596551 proto=6 action="accept" policyid=10045 policytype="policy" poluuid="96f15028-15d6-51e9-6b81-d98bf1466b99" user="JULLOPEZ" authserver="FSSO_PSR" service="Microsoft-Office365" trandisp="snat" transip=199.68.152.135 transport=50423 appid=41468 app="Microsoft.Office.365.Portal" appcat="Collaboration" apprisk="elevated" applist="Edge-Prod-Block-Mode-P2P_PROXY" duration=30553 sentbyte=94401 rcvdbyte=112203 sentpkt=1052 rcvdpkt=1539 sentdelta=254 rcvddelta=230
May 19 16:26:00 10.150.160.13 date=2023-05-19 time=16:26:01 devname="fw1-test-lv-external" devid="ABC1K5DT918800482" eventtime=1684538762118706615 tz="-0700" logid="0000000020" type="traffic" subtype="forward" level="notice" vd="PRODUCTION" srcip=10.150.106.11 srcport=54254 srcintf="ABCTCORPO3.3052" srcintfrole="lan" dstip=17.188.143.10 dstport=443 dstintf="ABCTINTPO1.3053" dstintfrole="wan" srccountry="Reserved" dstcountry="United States" sessionid=513091673 proto=6 action="accept" policyid=83 policytype="policy" poluuid="2k2k2-b4c8-51e9-512e-62cf5b7e3bcd" policyname="Internal Server Nets Outbound" service="HTTPS" trandisp="snat" transip=38.70.139.3 transport=54254 appid=42662 app="Apple.Services" appcat="General.Interest" apprisk="elevated" applist="PROD-APPCTRL_LV-EXT" appact="detected" duration=686309 sentbyte=22070530 rcvdbyte=14199406 sentpkt=279649 rcvdpkt=148924 sentdelta=3600 rcvddelta=2352 srchwvendor="Cisco" devtype="Network" srcfamily="AP" osname="Cisco IOS" mastersrcmac="12:12:12:12:dc:27" srcmac="12:12:12:12:dc:27" srcserver=0
May 19 16:26:59 10.151.129.106 date=2023-05-19 time=16:27:00 devname="FW1-testPSR-DC" devid="ABCT3KD3Z17800305" eventtime=1684538820421783095 tz="-0700" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="DataCenter" srcip=10.151.110.100 srcname="PSRPSOLAPP01.test.NET" identifier=2875 srcintf="Enterprise_ACI" srcintfrole="wan" dstip=10.132.116.4 dstname="10.132.116.4" dstintf="Enterprise" dstintfrole="lan" srccountry="Reserved" dstcountry="Reserved" sessionid=3796675657 proto=1 action="accept" policyid=10654 policytype="policy" poluuid="6ddcbe16-9058-51ec-2052-64e35cf6fddc" policyname="Solarwinds Catch-ALL" user="SVC-SOLARWINDS-IPAM" authserver="FSSO_PSR" service="PING" trandisp="noop" duration=60 sentbyte=59 rcvdbyte=59 sentpkt=1 rcvdpkt=1 appcat="unscanned"
May 19 16:33:13 10.150.148.52 date=2023-05-19 time=16:33:14 devname="ABCZWPFWFTG-B" devid="ABCVM8V0000158159" eventtime=1684539194871377700 tz="-0700" logid="0000000013" type="traffic" subtype="forward" level="notice" vd="root" srcip=10.151.100.36 identifier=18877 srcintf="TUNNEL_SCH" srcintfrole="undefined" dstip=10.150.148.52 dstintf="port2" dstintfrole="lan" srccountry="Reserved" dstcountry="Reserved" sessionid=105443335 proto=1 action="accept" policyid=8 policytype="policy" poluuid="06d7ce0e-e8ae-51ed-b77f-59e907ddba86" policyname="test TO AZURE LAN" service="icmp/8/0" trandisp="noop" appid=24466 app="Ping" appcat="Network.Service" apprisk="elevated" applist="PROD-APPCTRL-AZURE" duration=60 sentbyte=84 rcvdbyte=84 sentpkt=1 rcvdpkt=1 vpn="TUNNEL_SCH" vpntype="ipsec-static" utmaction="allow" countapp=1 masterdstmac="12:12:12:12:9a:bc" dstmac="12:12:12:12:9a:bc" dstserver=1

This regex is based off one of the answers, so I rewrote it for my actual data but it's not working right

.*type="(anomaly|log|event|utm)".*(*SKIP)(*FAIL)|(^.{15})\s(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}).+?(devname=\S+)\s(devid=\S+).+?(?:vd=\H+)|(srcip=\H+)|(srcport=\H+)|(srcintf=\H+)|(dstip=\H+)|(dstport=\H+)|(dstintf=\H+)|(proto=\H+)|(action=\H+)|(policyid=\H+)|(user=\H+)|(service=\H+)|(transport=\H+)|(app=\H+)|(applist=\H+)|(vpn=\H+)|(vpntype="?\S+"?)|(?:\s+)|(?:\S+=".+?")|(?:\S+=\S+)

Here is link to regex example with best solution so far:

https://regex101.com/r/BDkbMb/1

moliminous
  • 21
  • 4
  • To exclude `type=b` you can use [PCRE verbs `(*SKIP)(*F)`](https://stackoverflow.com/questions/24534782/how-do-skip-or-f-work-on-regex) for example like this: [`^.*?\btype=b\b.*(*SKIP)(*F)|.*?\b\Kfield\d+=\S+`](https://regex101.com/r/bKzHmm/1) (it also uses [`\K` to reset](http://www.rexegg.com/regex-php.html#K)). Or another [`\G`](https://www.regular-expressions.info/continue.html) and `\K` based idea to require `type=a` using a *lookahead* to attach matches: [`(?:\G(?!^)|^(?=.*?\btype=a\b)).*?\b\Kfield\d+=\S+`](https://regex101.com/r/mQ6Tfx/1) – bobble bubble Jun 09 '23 at 08:52
  • 1
    Something like this? https://regex101.com/r/k3SiEu/1 – The fourth bird Jun 09 '23 at 15:18
  • You updated the question, did it work out? – The fourth bird Jun 09 '23 at 15:39
  • I will check soon - crazy busy today sorry I just saw your comments – moliminous Jun 09 '23 at 16:21
  • I don't understand it yet but that looks promising with the PCRE verbs – moliminous Jun 09 '23 at 16:25
  • @Thefourthbird YES that actually worked, although ideally I would say "keep this" instead of "skip fail if that", but here is my final regex https://regex101.com/r/EM9jGG/1 – moliminous Jun 13 '23 at 01:14
  • Trying to do all the job with a pattern is in your case a poor approach that ends with an inelegant and complicated for nothing code. Use a programming language! Example with php: https://3v4l.org/mucuq#v8.2.7 – Casimir et Hippolyte Jun 18 '23 at 14:30

2 Answers2

2

PCRE/PCRE2 (with \K and \G):

(?(DEFINE)                 # Subpattern declaration:
  (?<field>                # Match a field
    \b                     # with a key that is either
    (?:field1|field2)      # 'field1' or 'field2' (insert other field names here)
    =\S+                   # then a '=' and 1+ non-whitespace chars.
    \b                     #
  )                        #
)                          #
                           # Main pattern:
\g<field>(?=.*\btype=a\b)  # A field followed by anything then 'type=a'
|                          # or
(?:\btype=a\b|\G(?!^)).*?  # 'type=a' or the end of the last match, followed by anything,
\K                         # all of which we forfeit (not included in the match),
\g<field>                  # then a field.

Try it on regex101.com.

ECMAScript/.NET (with lookbehind):

\b(?:field[1234])=\S+\b  # Match a field
(?=.*\btype=a\b)         # followed by 'type=a'
|                        # or
(?<=\btype=a\b.*)        #         preceded by 'type=a'.
\b(?:field[1234])=\S+\b  # a field

Try it on regex101.com.

InSync
  • 4,851
  • 4
  • 8
  • 30
  • Awesome solution! – David542 Jun 08 '23 at 23:53
  • Doesn't work on my actual data, added above. I tried accepting your answer and posting a new question with my real data but I got yelled at so I had to update this one, and my modified version of your regex doesn't work here for some reason I can't figure it out. – moliminous Jun 09 '23 at 15:32
  • @moliminous You were really close. Just drop the `\b` after `\S+` and `"traffic"`. – InSync Jun 09 '23 at 16:10
  • Actually this doesn't work - if you replace the 'type=a' with 'type=b' it incorrectly matches that line too – moliminous Jun 09 '23 at 16:19
  • @moliminous Fixed by adding `(?!^)` to right after `\G`. – InSync Jun 09 '23 at 16:23
  • Any way to have the fields be capture groups instead of non-captured groups? I need to refer to them later as variables for Group 1, Group2, etc – moliminous Jun 12 '23 at 22:36
  • @moliminous The main regex has two branches which both refers to `\g`, so no; you can [substitute](https://regex101.com/r/1kwqf6/5) `\g` with the definition itself and store the fields in two different groups. However, that's from a regex stand point; I am not sure about your language's regex APIs, which may or may not return an array of matched content for each group. – InSync Jun 12 '23 at 22:42
  • It's not returning an array, it's discarding all the fields I want to capture because they're noncapturing groups. It is matching only certain events correctly, but I have to capture the fields instead of non-capture – moliminous Jun 12 '23 at 22:50
  • @moliminous Have you tried the regex I linked above? It should give you the fields, albeit in two groups instead of one. – InSync Jun 12 '23 at 22:52
  • Nothing linked above works. I need to capture (not just match) the field value pairs (fieldname=fieldvalue) as listed, match but discard everything else (non-capture group) – moliminous Jun 13 '23 at 00:17
  • @moliminous Do you mean something like [this](https://regex101.com/r/1kwqf6/6)? – InSync Jun 13 '23 at 00:20
  • This is closer but I also need to match all other characters so I can throw it away, I'm also afraid of the Group numbers being the same. https://regex101.com/r/bbQPqF/1 – moliminous Jun 13 '23 at 00:30
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/254052/discussion-between-insync-and-moliminous). – InSync Jun 13 '23 at 00:52
1

You can start by matching the beginning date, time and ip of the log.

Then skip matching all the key=value pairs that you don't want, and match the ones that match one of the alternatives:

(?:^[A-Z][a-z]+\h+\d{1,2}\h+\d{2}:\d{2}:\d{2}\h+\d{1,3}(?:\.\d{1,3}){3}\b(?=.*?\btype="traffic")(?!.*\btype="utm")|\G(?!^))(?:(?!\h+(?:dev(?:name|id)|vd|src(?:ip|port|intf)|dst(?:ip|port|intf)|proto|action|policyid|user|service|transport|app(?:list)?|vpn(?:type)?)=)\h+\S+)*+\h*\K[^\s=]+=\S+

Explanation

  • (?: Non capture group for the alternatives
    • ^ Start of string
    • [A-Z][a-z]+\h+\d{1,2}\h+\d{2}:\d{2}:\d{2}\h+\d{1,3}(?:\.\d{1,3}){3}\b Match the leading data, time and ip like format
    • (?=.*?\btype="traffic")(?!.*\btype="utm") Assert that to the right is type="traffic" and is not type="utm"
    • | Or
    • \G(?!^) Assert the current position at the end of the previous match, not at the start
  • ) Close the non capture group
  • (?: Non capture group
    • (?! Negative lookahead, assert that from the current position to the right is not
    • \h+ Match 1+ horizontal whitespace chars
    • (?:dev(?:name|id)|vd|src(?:ip|port|intf)|dst(?:ip|port|intf)|proto|action|policyid|user|service|transport|app(?:list)?|vpn(?:type)?)=) Match one of the alternatives
    • \h+\S+ Match 1+ horizontal whitespace chars and 1+ non whitespace chars
  • )*+ Close the non capture group and optionally repeat using a possessive quantifier
  • \h*\K Match optional horizontal whitespace chars and forget what is matched until now
  • [^\s=]+=\S+ Match the key value pair

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70