2

I'm trying to use regex in Python to parse out the source, destination (IPs and ports) and the time stamp from a snort alert file. Example as below:

03/09-14:10:43.323717  [**] [1:2008015:9] ET MALWARE User-Agent (Win95) [**] [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 172.16.116.194:28692 -> 205.181.112.65:80

I have a regex for the IP, but it doesn't fire correctly because of the port in the IP. How can I get the port separate from the IP?

^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$
Will
  • 24,082
  • 14
  • 97
  • 108
user3498593
  • 113
  • 1
  • 1
  • 11
  • remove the anchors `^` and `$` and try..that will capture IP – rock321987 Jul 03 '16 at 13:11
  • New scenario, what about without the ports? As so: `03/09-15:32:15.537934 [**] [1:2100366:8] GPL ICMP_INFO PING *NIX [**] [Classification: Misc activity] [Priority: 3] {ICMP} 172.16.114.50 -> 172.16.114.148` – user3498593 Jul 09 '16 at 22:18

4 Answers4

3

This should extract the necessary parts from the full line:

r'([0-9:./-]+)\s+.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})\s+->\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})'

See this example:

In [22]: line = '03/09-14:10:43.323717  [**] [1:2008015:9] ET MALWARE User-Agent (Win95) [**] [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 172.16.116.194:28692 -> 205.181.112.65:80'

In [23]: m = re.match(r'([0-9:./-]+)\s+.*?(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})\s+->\s+(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})', line)

In [24]: m.group(1)
Out[24]: '03/09-14:10:43.323717'

In [25]: m.group(2)
Out[25]: '172.16.116.194'

In [26]: m.group(3)
Out[26]: '28692'

In [27]: m.group(4)
Out[27]: '205.181.112.65'

In [28]: m.group(5)
Out[28]: '80'
Will
  • 24,082
  • 14
  • 97
  • 108
  • 1
    Great! Splitting out the time into a separate entity would just be another group correct? – user3498593 Jul 03 '16 at 14:22
  • Right, just change `([0-9:./-]+)` to `([0-9/]+)-([0-9:.]+)`. – Will Jul 03 '16 at 14:24
  • Only remaining piece is to remove the microseconds from the timestamp. I thought I could do this with strftime, but it doesn't work like I want because the input string time format doesn't match the output string format. – user3498593 Jul 04 '16 at 12:24
  • It reads through a text file. What if one of those group fields doesn't return anything? For example, there are some IPs that have no ports associated with them. I'm running into an issue where I get a NoneType error when I hit one of those. – user3498593 Jul 08 '16 at 02:15
1

You can separate them into different capture groups this way:

(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})

Losing both ^ and $ will give you the ability to match in the middle of the line not just as a whole line.

Yaron
  • 1,199
  • 1
  • 15
  • 35
1

If I understand you correctly, you want to capture the IPs and the ports separately, right?

In that case, using "groups" in the regular expression would solve your problem:

result = re.search(r'((\d{1,3}\.){3}\d{1,3}):(\d{1,5})', input)

Now, result.group(1) contains the IP address and result.group(3) the port.

1

Description

^((?:[0-9]{2}[-\/:.]){5}[0-9]{6}).*[{]TCP[}]\s*(((?:[0-9]{1,3}[.]){1,3}[0-9]{1,3}):([0-9]{1,6}))\s*->\s*(((?:[0-9]{1,3}[.]){1,3}[0-9]{1,3}):([0-9]{1,6}))

Regular expression visualization

** To see the image better, simply right click the image and select view in new window

This regular expression will do the following:

  • Captures the timestamp into capture group 1
  • Captures the source IP address and port into capture groups 2, 3, 4
  • Captures the destination IP address and port into capture groups 5, 6, 7
  • requires the IP source and destination to be proceeded by {TCP} incase the message also contains an IP address.

Example

Live Demo

https://regex101.com/r/hD4fW8/1

Sample text

03/09-14:10:43.323717  [**] [1:2008015:9] ET MALWARE User-Agent (Win95) [**] [Classification: A Network Trojan was detected] [Priority: 1] {TCP} 172.16.116.194:28692 -> 205.181.112.65:80

Sample Matches

MATCH 1
1.  [0-21]  `03/09-14:10:43.323717`
2.  [145-165]   `172.16.116.194:28692`
3.  [145-159]   `172.16.116.194`
4.  [160-165]   `28692`
5.  [169-186]   `205.181.112.65:80`
6.  [169-183]   `205.181.112.65`
7.  [184-186]   `80`

Explanation

NODE                     EXPLANATION
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    (?:                      group, but do not capture (5 times):
----------------------------------------------------------------------
      [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
      [-\/:.]                  any character of: '-', '\/', ':', '.'
----------------------------------------------------------------------
    ){5}                     end of grouping
----------------------------------------------------------------------
    [0-9]{6}                 any character of: '0' to '9' (6 times)
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  [{]                      any character of: '{'
----------------------------------------------------------------------
  TCP                      'TCP'
----------------------------------------------------------------------
  [}]                      any character of: '}'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    (                        group and capture to \3:
----------------------------------------------------------------------
      (?:                      group, but do not capture (between 1
                               and 3 times (matching the most amount
                               possible)):
----------------------------------------------------------------------
        [0-9]{1,3}               any character of: '0' to '9'
                                 (between 1 and 3 times (matching the
                                 most amount possible))
----------------------------------------------------------------------
        [.]                      any character of: '.'
----------------------------------------------------------------------
      ){1,3}                   end of grouping
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \3
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    (                        group and capture to \4:
----------------------------------------------------------------------
      [0-9]{1,6}               any character of: '0' to '9' (between
                               1 and 6 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \4
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  ->                       '->'
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    (                        group and capture to \6:
----------------------------------------------------------------------
      (?:                      group, but do not capture (between 1
                               and 3 times (matching the most amount
                               possible)):
----------------------------------------------------------------------
        [0-9]{1,3}               any character of: '0' to '9'
                                 (between 1 and 3 times (matching the
                                 most amount possible))
----------------------------------------------------------------------
        [.]                      any character of: '.'
----------------------------------------------------------------------
      ){1,3}                   end of grouping
----------------------------------------------------------------------
      [0-9]{1,3}               any character of: '0' to '9' (between
                               1 and 3 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \6
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    (                        group and capture to \7:
----------------------------------------------------------------------
      [0-9]{1,6}               any character of: '0' to '9' (between
                               1 and 6 times (matching the most
                               amount possible))
----------------------------------------------------------------------
    )                        end of \7
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43