2

I​ want the following regex code to return an output of IP addresses without returning other number values as IP from the source file.

The Code:

import re

logdata = 146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622
for item in re.finditer("(?P<host>[\d.]+)", logdata):
    print(item.groupdict())

Required output:

{'host': '146.204.224.152'}

U​nwanted output:

{'host': '6811'}
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Mustapha
  • 23
  • 5

2 Answers2

1

I think this should do it:

(?P<host>(\d+\.){3}\d+)
ZygD
  • 22,092
  • 39
  • 79
  • 102
  • 146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622 How can I return the date and time please! – Mustapha Mar 27 '21 at 16:53
1

Use

import re
logdata = r'146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622'
for item in re.finditer(r"\b(?P<host>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})\b", logdata):
    print(item.groupdict())

See Python proof.

Results: {'host': '146.204.224.152'}.

See Extract ip addresses from Strings using regex.

Getting both host and time from a log line like you have:

import re
logdata = r'146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622'
match_data = re.search(r'^(?P<host>\S+).*?\[(?P<time>.*?)]', logdata)
if match_data:
    print(match_data.groupdict())

See Python proof.

EXPLANATION

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?P<host>                  group and capture to (?P=host):
--------------------------------------------------------------------------------
    \S+                      non-whitespace (all but \n, \r, \t, \f,
                             and " ") (1 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
  )                        end of (?P=host)
--------------------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  \[                       '['
--------------------------------------------------------------------------------
  (?P<time>                  group and capture to (?P=time):
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of (?P=time)
--------------------------------------------------------------------------------
  ]                        ']'
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
  • Wow! Thanks a lot! That's a long nerdy piece of regex! lol! Please, could this piece of regex return the date and time from the same source file as I shared before? ("(?P – Mustapha Mar 27 '21 at 16:47