0

I am trying to grab a hostname from configs and sometime there is a -p or -s added to the hostname in config, that is not really part of the hostname. So I wrote this regex to fetch the hostname from the config file:

REGEX_HOSTNAME = re.compile('^hostname\s(?P<hostname>(\w|\W)+?)(-p|-P|-s|-S)?$\n',re.MULTILINE)

hostname = REGEX_HOSTNAME.search(config).group('hostname').lower().strip()

This is a sample part of the config that I using the regex on:

terminal width 120
hostname IGN-HSHST-HSH-01-P
domain-name sample.com

But in my result list of hostnames there is still the -P at the end.

ign-hshst-hsh-01-p
ign-hshst-hsh-02-p
ign-hshst-hsd-10
ign-hshst-hsh-01-S
ign-hshst-hsd-11
ign-hshst-hsh-02-s

In Regex 101 online tester it works and the -P is part of the last group. In my python (2.7) script it does not work.

Strange behavior is that when I use a slightly modified 2 pass regex it works:

REGEX_HOSTNAME = re.compile(r'^hostname\s*(?P<hostname>.*?)\n?$', re.MULTILINE)
REGEXP_CLUSTERNAME = re.compile('(?P<clustername>.*?)(?:-[ps])?$')
            hostname = REGEX_HOSTNAME.search(config).group('hostname').lower().strip()
            clustername = REGEXP_CLUSTERNAME.match(hostname).group('clustername')

Now Hostname has the full name and the clustername the one without the optional '-P' at the end.

Empusas
  • 372
  • 2
  • 17
  • 1
    You get [`ign-hshst-hsh-01`](https://ideone.com/rNpG9W) with the code above. Are you sure you have shared the exact code you have? – Wiktor Stribiżew May 01 '20 at 14:13
  • I'm not sure if it's part of the problem, but you should always use a raw string for regexp. – Barmar May 01 '20 at 14:15
  • 1
    While it is true raw string literals should be used when defining regexps in Python code, it is not the problem here. For now, it is just unclear, as the code shown does not show the said behavior. – Wiktor Stribiżew May 01 '20 at 14:18
  • Yes, that is the exact code copied from my script. I tried it over and over again. In all the online Reges testers it works. In my script running on Mac it does not. For some reason the hostname group also grabs the '-p'. I am glad to change my regex if you have a better suggestion. – Empusas May 01 '20 at 14:39
  • 1
    Try this code - https://rextester.com/AGMTN72300 – Wiktor Stribiżew May 01 '20 at 14:59
  • Please check my answer. If it does not help, please provide more detailsm namely, how do you read the data in code? – Wiktor Stribiżew May 01 '20 at 15:43
  • Any feedback...? – Wiktor Stribiżew May 02 '20 at 08:52
  • Hi, with your code it get exactly the same result as with mine. The '-p' is still included in the hostname. – Empusas May 02 '20 at 10:23

1 Answers1

2

You may use

import re

config=r"""terminal width 120
hostname IGN-HSHST-HSH-01-P
domain-name sample.com"""

REGEX_HOSTNAME = re.compile(r'^hostname\s*(.*?)(?:-[ps])?$', re.MULTILINE|re.I)
hostnames =[ h.lower().strip() for h in REGEX_HOSTNAME.findall(config) ]
print(hostnames) # => ['ign-hshst-hsh-01']

See the Python demo. The ^hostname\s*(.*?)(?:-[ps])?$ regex matches:

  • ^ - start of a line (due to re.MULTILINE, it matches a position after line breaks, too)
  • hostname - a word (case insensitive, due to re.I)
  • \s* - 0+ whitespaces
  • (.*?) - Group 1: zero or more chars other than line break chars, as few as possible
  • (?:-[ps])? - an optional occurrence of - and then p or s (case insensitive!)
  • $ - end of a line (due to re.MULTILINE).

See the regex demo online.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563