1

I have this string bellow on iptables logs, i want parse full content. My actual regex parse 90% but i need the all content logs.

My python regex:

regex = re.compile('([^ ]+)=([^ ]+)')

I need this parameters too:

Aug 13 17:16:33 app-srv01 kernel: newConnection -

Regex Result:

[('IN', 'eth0'), ('MAC', '56:00:01:a1:5c:b7:fe:00:01:a1:5c:b7:08:00'), ('SRC', '91.103.125.80'), ('DST', '45.33.223.166'), ('LEN', '52'), ('TOS', '0x00'), ('PREC', '0x00'), ('TTL', '113'), ('ID', '21200'), ('PROTO', 'TCP'), ('SPT', '55743'), ('DPT', '445'), ('WINDOW', '8192'), ('RES', '0x00'), ('URGP', '0')] 

Log String:

Aug 13 17:16:33 app-srv01 kernel: newConnection - IN=eth0 OUT= MAC=56:00:01:a1:5c:b7:fe:00:01:a1:5c:b7:08:00 SRC=91.103.125.80 DST=45.33.223.166 LEN=52 TOS=0x00 PREC=0x00 TTL=113 ID=21200 DF PROTO=TCP SPT=55743 DPT=445 WINDOW=8192 RES=0x00 SYN URGP=0

Output expected:

[('Aug 13 17:16:33'), ('app-srv01 kernel:'), ('newConnection -'), 
('IN', 'eth0'), ('MAC', '56:00:01:a1:5c:b7:fe:00:01:a1:5c:b7:08:00'), ('SRC', 
'91.103.125.80'), ('DST', '45.33.223.166'), ('LEN', '52'), ('TOS', '0x00'), ('PREC', 
'0x00'), ('TTL', '113'), ('ID', '21200'), ('PROTO', 'TCP'), ('SPT', '55743'), ('DPT', 
'445'), ('WINDOW', '8192'), ('RES', '0x00'), ('URGP', '0')] 

Some can help. I'm using python3 Thanks

dmrpy
  • 59
  • 1
  • 8

2 Answers2

0

You can do that with re.split, using a space before a abc=def as separator, then you split a second time each item on the equal sign:

[x.split('=') for x in re.split(r' (?=\S+=)', s)]
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • hi this works fine, [['Aug 13 17:16:33 app-srv01 kernel: newConnection -'], ['IN', 'eth0'], ['OUT', ''], ['MAC', '56:00:01:a1:5c:b7:fe:00:01:a1:5c:b7:08:00'], ['SRC', '91.103.125.80'], ['DST', '45.33.223.166'], ['LEN', '52'], ['TOS', '0x00'], ['PREC', '0x00'], ['TTL', '113'], ['ID', '21200 DF'], ['PROTO', 'TCP'], ['SPT', '55743'], ['DPT', '445'], ['WINDOW', '8192'], ['RES', '0x00 SYN'], ['URGP', '0']].. But is possible separe date "Aug 13 17:16:33" of app-srv01 kernel: newConnection - ? Because date is one key for me. the others params like app-srv01 and newconnection not is most important – dmrpy Aug 08 '19 at 13:30
  • @dmrpy: yes do it "by hand" in a second time. – Casimir et Hippolyte Aug 08 '19 at 16:12
0

If you want the date at the start (and the other 2 are not the most important as in the comments) and you want the matches from your current pattern, you might use an alternation:

^([a-zA-Z]+ \d{1,2} \d{1,2}:\d{1,2}:\d{1,2})|([^ ]+)=([^ ]+)
  • ^ Start of the string
  • ([a-zA-Z]+ \d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) Capture group 1, match a "date like" pattern
  • | Or
  • ([^ ]+)=([^ ]+) Your initial pattern capturing the values in group 2 and group 3

Regex demo | Python demo

For example

import re
regex = r"^([a-zA-Z]+ \d{1,2} \d{1,2}:\d{1,2}:\d{1,2})|([^ ]+)=([^ ]+)"     
test_str = "Aug 13 17:16:33 app-srv01 kernel: newConnection - IN=eth0 OUT= MAC=56:00:01:a1:5c:b7:fe:00:01:a1:5c:b7:08:00 SRC=91.103.125.80 DST=45.33.223.166 LEN=52 TOS=0x00 PREC=0x00 TTL=113 ID=21200 DF PROTO=TCP SPT=55743 DPT=445 WINDOW=8192 RES=0x00 SYN URGP=0"

print(list(map(lambda x: tuple(filter(None, x)), re.findall(regex, test_str))))

Result

[('Aug 13 17:16:33',), ('IN', 'eth0'), ('MAC', '56:00:01:a1:5c:b7:fe:00:01:a1:5c:b7:08:00'), ('SRC', '91.103.125.80'), ('DST', '45.33.223.166'), ('LEN', '52'), ('TOS', '0x00'), ('PREC', '0x00'), ('TTL', '113'), ('ID', '21200'), ('PROTO', 'TCP'), ('SPT', '55743'), ('DPT', '445'), ('WINDOW', '8192'), ('RES', '0x00'), ('URGP', '0')]

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • hi, is possible separate date too from the other params, because date is key for me. – dmrpy Aug 08 '19 at 13:32
  • @dmrpy Do you want to exclude the date from `Aug 13 17:16:33 app-srv01 kernel: newConnection - IN`? What should/does it look like? – The fourth bird Aug 08 '19 at 13:34
  • Not, i need the data separated into one group, because the date is key for me. I put an expected output. – dmrpy Aug 08 '19 at 13:39