Parsing Snort Alert File with Regex

Question

I'm trying to use regex in Python to parse out the source, destination (IPs and ports) and the time stamp from a snort alert file. Example as below:

03/09-14:10:43.323717  [**] [1:2008015:9] ET MALWARE User-Agent (Win95) [**] [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 172.16.116.194:28692 -> 205.181.112.65:80

I have a regex for the IP, but it doesn't fire correctly because of the port in the IP. How can I get the port separate from the IP?

^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$

remove the anchors `^` and `$` and try..that will capture IP — rock321987, Jul 03 '16 at 13:11
New scenario, what about without the ports? As so: `03/09-15:32:15.537934 [**] [1:2100366:8] GPL ICMP_INFO PING *NIX [**] [Classification: Misc activity] [Priority: 3] {ICMP} 172.16.114.50 -> 172.16.114.148` — user3498593, Jul 09 '16 at 22:18

score 3 · Answer 1 · answered Jul 03 '16 at 13:23

3

This should extract the necessary parts from the full line:

r'([0-9:./-]+)\s+.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})\s+->\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})'

See this example:

In [22]: line = '03/09-14:10:43.323717  [**] [1:2008015:9] ET MALWARE User-Agent (Win95) [**] [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 172.16.116.194:28692 -> 205.181.112.65:80'

In [23]: m = re.match(r'([0-9:./-]+)\s+.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})\s+->\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})', line)

In [24]: m.group(1)
Out[24]: '03/09-14:10:43.323717'

In [25]: m.group(2)
Out[25]: '172.16.116.194'

In [26]: m.group(3)
Out[26]: '28692'

In [27]: m.group(4)
Out[27]: '205.181.112.65'

In [28]: m.group(5)
Out[28]: '80'

answered Jul 03 '16 at 13:23

Will

24,082
14
97
108

1

Great! Splitting out the time into a separate entity would just be another group correct? – user3498593 Jul 03 '16 at 14:22
Right, just change `([0-9:./-]+)` to `([0-9/]+)-([0-9:.]+)`. – Will Jul 03 '16 at 14:24
Only remaining piece is to remove the microseconds from the timestamp. I thought I could do this with strftime, but it doesn't work like I want because the input string time format doesn't match the output string format. – user3498593 Jul 04 '16 at 12:24
It reads through a text file. What if one of those group fields doesn't return anything? For example, there are some IPs that have no ports associated with them. I'm running into an issue where I get a NoneType error when I hit one of those. – user3498593 Jul 08 '16 at 02:15

score 1 · Answer 2 · answered Jul 03 '16 at 13:16

1

You can separate them into different capture groups this way:

(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})

Losing both ^ and $ will give you the ability to match in the middle of the line not just as a whole line.

answered Jul 03 '16 at 13:16

Yaron

1,199
1
15
35

tobi-wan-kenobi · Answer 3 · 2016-07-03T14:09:16.830

1

If I understand you correctly, you want to capture the IPs and the ports separately, right?

In that case, using "groups" in the regular expression would solve your problem:

result = re.search(r'((\d{1,3}\.){3}\d{1,3}):(\d{1,5})', input)

Now, result.group(1) contains the IP address and result.group(3) the port.

edited Jul 03 '16 at 14:09

answered Jul 03 '16 at 13:17

tobi-wan-kenobi

36
5

score 1 · Answer 4 · answered Jul 03 '16 at 14:15

Description

^((?:[0-9]{2}[-\/:.]){5}[0-9]{6}).*[{]TCP[}]\s*(((?:[0-9]{1,3}[.]){1,3}[0-9]{1,3}):([0-9]{1,6}))\s*->\s*(((?:[0-9]{1,3}[.]){1,3}[0-9]{1,3}):([0-9]{1,6}))

Regular expression visualization

_{** To see the image better, simply right click the image and select view in new window}

This regular expression will do the following:

Captures the timestamp into capture group 1
Captures the source IP address and port into capture groups 2, 3, 4
Captures the destination IP address and port into capture groups 5, 6, 7
requires the IP source and destination to be proceeded by {TCP} incase the message also contains an IP address.

Example

Live Demo

https://regex101.com/r/hD4fW8/1

Sample text

03/09-14:10:43.323717  [**] [1:2008015:9] ET MALWARE User-Agent (Win95) [**] [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 172.16.116.194:28692 -> 205.181.112.65:80

Sample Matches

MATCH 1
1.  [0-21]  `03/09-14:10:43.323717`
2.  [145-165]   `172.16.116.194:28692`
3.  [145-159]   `172.16.116.194`
4.  [160-165]   `28692`
5.  [169-186]   `205.181.112.65:80`
6.  [169-183]   `205.181.112.65`
7.  [184-186]   `80`

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (?:                      group, but do not capture (5 times):
----------------------------------------------------------------------
      [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
      [-\/:.]                  any character of: '-', '\/', ':', '.'
----------------------------------------------------------------------
    ){5}                     end of grouping
----------------------------------------------------------------------
    [0-9]{6}                 any character of: '0' to '9' (6 times)
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  [{]                      any character of: '{'
----------------------------------------------------------------------
  TCP                      'TCP'
----------------------------------------------------------------------
  [}]                      any character of: '}'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    (                        group and capture to \3:
----------------------------------------------------------------------
      (?:                      group, but do not capture (between 1
                               and 3 times (matching the most amount
                               possible)):
----------------------------------------------------------------------
        [0-9]{1,3}               any character of: '0' to '9'
                                 (between 1 and 3 times (matching the
                                 most amount possible))
----------------------------------------------------------------------
        [.]                      any character of: '.'
----------------------------------------------------------------------
      ){1,3}                   end of grouping
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \3
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    (                        group and capture to \4:
----------------------------------------------------------------------
      [0-9]{1,6}               any character of: '0' to '9' (between
                               1 and 6 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \4
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  ->                       '->'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    (                        group and capture to \6:
----------------------------------------------------------------------
      (?:                      group, but do not capture (between 1
                               and 3 times (matching the most amount
                               possible)):
----------------------------------------------------------------------
        [0-9]{1,3}               any character of: '0' to '9'
                                 (between 1 and 3 times (matching the
                                 most amount possible))
----------------------------------------------------------------------
        [.]                      any character of: '.'
----------------------------------------------------------------------
      ){1,3}                   end of grouping
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \6
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    (                        group and capture to \7:
----------------------------------------------------------------------
      [0-9]{1,6}               any character of: '0' to '9' (between
                               1 and 6 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \7
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------

Parsing Snort Alert File with Regex

4 Answers4

Description

Example

Explanation

Linked