1

Using a regular expression, I need to match only the IPv4 subnet mask from the given input string:

ip=10.0.20.100::10.0.20.1:255.255.254.0:ws01.example.com::off

For testing this input string is contained in a text file called file.txt, however the actual use case will be to parse /proc/cmdline, and I will need a solution that starts parsing, counting fields, and matching after encountering "ip=" until the next white space character.

I'm using bash 4.2.46 with GNU grep 2.20 on an EL 7.9 workstation, x86_64 to test the expression.

Based on examples I've seen looking at other questions, I've come up with the following grep command and PCRE regular expression which gives output that is very close to what I need.

[user@ws01 ~]$ grep -o -P '(?<!:)(?:\:[0-9])(.*?)(?=:)' file.txt 
:255.255.254.0

My understanding of what I've done here is that, I've started with a negative lookbehind with a ":" character to try and exclude the first "::" field, followed by a non capturing group to match on an escaped ":" character, followed by a number, [0-9], then a capturing group with .*?, for the actual match of the string itself, and finally a look ahead for the next ":" character.

The problem is that this gives the desired string, but includes an extra : character at the beginning of the string.

Expected output should look like this:

255.255.254.0

What's making this tricky for me to figure out is that the delimiters are not consistent. The string includes both double colons, and single colon fields, so I haven't been able to just simply match on the string between the delimiters. The reason for this is because a field can have an empty value. For example

:<null>:ip:gw:netmask:hostname:<null>:off

Null is shown here to indicate an omitted value not passed by the user, that the user does not need to provide for the intended purpose.

I've tried a few different expressions as suggested in other answers that use negative look behinds and look aheads to not start matching at a : which is neighbored by another :

For example, see this question: Regular Expression to find a string included between two characters while EXCLUDING the delimiters

If I can start matching at the first single colon, by itself, which is not followed by or preceded by another : character, while excluding the colon character as the delimiter, and continue matching until the next single colon which is also not neighboring another : and without including the colon character, that should match the desired string.

I'm able to match the exact string by including "255" in an expression like this: (Which will work for all of our present use cases)

[user@ws01 ~]$ grep -o -P '(?:)255.*?(?=:)' file.txt
255.255.254.0

The logic problem here is that the subnet mask itself, may not always start with "255", but it should be a number, [0-9] which is why I'm attempting to use that in the expression above. For the sake of simplicity, I don't need to validate that it's not greater than 255.

oguz ismail
  • 1
  • 16
  • 47
  • 69
Chris
  • 158
  • 1
  • 9
  • 3
    You want the "255.255.254.0" ? And the format of the lines is always the same? Do `cut -d":" -f4`. – Nic3500 Nov 02 '22 at 17:36
  • 2
    Can fields be empty? Is it possible that it's all single-colon delimiters but some fields are just empty a lot? Do you have headers, and/or know the purpose/content of each field? These things matter. – Paul Hodges Nov 02 '22 at 18:04
  • 2
    @Paul Hodges Yes, the user providing input at the grub prompt, kernel command line, can omit to provide a value, resulting in an unpopulated field. Hence the reason for the double colon at the beginning and the end. Those are actually placeholders for additional values that could be provided. – Chris Nov 02 '22 at 18:58
  • 1
    @Paul Hodges, The number of fields is fixed and known, and the values for IP address, gateway, netmask, hostname, device name, nameserver etc, always appear in the same respective fields. Their positioning does not change in my case with EL7, anaconda, and kickstart. – Chris Nov 02 '22 at 19:17
  • 1
    @Nic3500 up-voted this is simple and clean, and avoids the need for a regular expression. Thanks – Chris Nov 02 '22 at 19:43
  • 1
    @Paul Hodges, In my use case, no value will ever need to be provided for the field with index 2. (The value that would be placed between the first :: pattern shown as the input string in the question) That field will always be empty. In theory a user could provide that input, but in actuality they won't. – Chris Nov 02 '22 at 20:24
  • 1
    @ChrisSmith I made it an answer. – Nic3500 Nov 03 '22 at 00:40

4 Answers4

3

Using gnu-grep you could write the pattern as:

grep -oP '(?<!:):\K\d{1,3}(?:\.\d{1,3}){3}(?=:(?!:))' file.txt

Output

255.255.254.0

Explanation

  • (?<!:): Negative lookahead, assert not : to the left and then match :
  • \K Forget what is matched until now
  • \d{1,3}(?:\.\d{1,3}){3} Match 4 times 1-3 digits separated by .
  • (?=:(?!:)) Positive lookahead, assert : that is not followed by :

See a regex demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
2

Using grep

$ grep -oP '(?<!:)?:\K([0-9.]+)(?=:[[:alpha:]])' file.txt

View Demo here or

$ grep -oP '[^:]*:\K[^:[:alpha:]]*' file.txt

Output

255.255.254.0
HatLess
  • 10,622
  • 5
  • 14
  • 32
2

If these are delimiters, your value should be in a clearly predictable place.

Just treat every colon as a delimiter and select the 4th field.

$: awk -F: '{print $4}' <<< ip=10.0.20.100::10.0.20.1:255.255.254.0:ws01.example.com::off
255.255.254.0

I'm not sure what you mean by

What's making this tricky for me to figure out is that the delimiters are not consistent. The string includes both double colons, and single colon fields, so I haven't been able to just simply match on the string between the delimiters.

If your delimiters aren't predictable and parse-able, they are useless. If you mean the fields can have or not have quotes, but you need to exclude quotes, we can do that. If double colons are one delimiter and single colons are another that's horrible design, but we can probably handle that, too.

$: awk -F'::' '{ split($2,x,":"); print x[2];}' <<< ip=10.0.20.100::10.0.20.1:255.255.254.0:ws01.example.com::off
255.255.254.0

For quotes, you need to provide an example.

Paul Hodges
  • 13,382
  • 1
  • 17
  • 36
  • 1
    I've updated the question to indicate why :: (double colons) appear in the string. Thanks for the feedback. The quote character does not appear in the input string. Thankfully, I have no need (so far) to deal with single or double quotes. Since the number of fields is known and stays the same, your suggestion of referencing the correct field by it's index is a good one. – Chris Nov 02 '22 at 19:38
  • 1
    According to the documentation the fields and their respective values are as follows: ip=:::::: – Chris Nov 02 '22 at 20:06
  • 1
    To be fair, Nic3500 suggested basically the same thing using `cut` in a comment right off. XD – Paul Hodges Nov 03 '22 at 14:11
2

Since the number of fields is always the same, simply separated by ":", you can use cut. That solution will also work if you have empty fields.

cut -d":" -f4
Nic3500
  • 8,144
  • 10
  • 29
  • 40
  • 1
    I've combined this with grep and a regex to start matching at "ip=" until the next white space. [user@ws01 ~]$ grep -oP '(?=ip=).*?(?=\s)' file.txt | cut -d":" -f4 255.255.254.0 This is far better than my original approach, because as you mentioned, I can always get the value from the correct field index, regardless of whether or not any of the fields are blank. – Chris Nov 03 '22 at 16:30