1

any regex wizards able to help?

I'm trying to get the regex to parse the Suricata fast log. So far I found a old post that kind of works here but would like to get all the data out of the log.

So far I can get the time, date, source ip, source port, destination ip and destination port but would like to also get the alert title, classification and priority.

Log file:

03/21/2021-20:24:02.524057  [**] [1:2006380:14] ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted [**] [Classification: Potential Corporate Privacy Violation] [Priority: 1] {TCP} 192.168.10.14:48820 -> 192.168.10.18:8086
03/21/2021-20:24:23.567546  [**] [1:2014939:5] ET POLICY DNS Query for TOR Hidden Domain .onion Accessible Via TOR [**] [Classification: Potential Corporate Privacy Violation] [Priority: 1] {UDP} 192.168.10.14:49405 -> 192.168.10.1:53

Python file:

import re

log_file = open('fast.log','r')
for line in log_file:
    r_search = re.search('([0-9/]+)-([0-9:.]+)\s+.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})\s+->\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})', line)
    print(f'Date - {r_search.group(1)}')
    print(f'Time - {r_search.group(2)}')
    print(f'Scr IP - {r_search.group(3)}')
    print(f'Scr Port - {r_search.group(4)}')
    print(f'Dess IP - {r_search.group(5)}')
    print(f'Dess Port - {r_search.group(6)}')
    print('***********')

log_file.close()

Current output:

Date - 03/21/2021
Time - 20:24:02.524057
Scr IP - 192.168.10.14
Scr Port - 48820
Dess IP - 192.168.10.18
Dess Port - 8086
***********

Wanted Output:

Date - 03/21/2021
Time - 20:24:02.524057
Alert Rule - ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted
Classification - Potential Corporate Privacy Violation
Priority - 1
Scr IP - 192.168.10.14
Scr Port - 48820
Dess IP - 192.168.10.18
Dess Port - 8086
***********

Thanks!

Dhruvan Ganesh
  • 1,502
  • 1
  • 18
  • 30

1 Answers1

1

The following regex pattern seems to be working here:

logs = ['03/21/2021-20:24:02.524057  [**] [1:2006380:14] ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted [**] [Classification: Potential Corporate Privacy Violation] [Priority: 1] {TCP} 192.168.10.14:48820 -> 192.168.10.18:8086', '03/21/2021-20:24:23.567546  [**] [1:2014939:5] ET POLICY DNS Query for TOR Hidden Domain .onion Accessible Via TOR [**] [Classification: Potential Corporate Privacy Violation] [Priority: 1] {UDP} 192.168.10.14:49405 -> 192.168.10.1:53']
for log in logs:
    matches = re.findall(r'^(.*?)-(\S+)\s+\[.*?\]\s+\[.*?\]\s+(.*?)\s+\[.*?\]\s+\[(.*?)\]\s+\[(.*?)\].*?(\d+(?:\.\d+)*):(\d+)\s+->\s+(\d+(?:\.\d+)*):(\d+).*$', log)
    print(matches)

This prints:

[('03/21/2021',
  '20:24:02.524057',
  'ET POLICY Outgoing Basic Auth Base64 HTTP Password detected unencrypted',
  'Classification: Potential Corporate Privacy Violation',
  'Priority: 1',
  '192.168.10.14',
  '48820',
  '192.168.10.18',
  '8086')]
[('03/21/2021',
  '20:24:23.567546',
  'ET POLICY DNS Query for TOR Hidden Domain .onion Accessible Via TOR',
  'Classification: Potential Corporate Privacy Violation',
  'Priority: 1',
  '192.168.10.14',
  '49405',
  '192.168.10.1',
  '53')]
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360