2

There is a good solution here to match an IP with a mask eg 192.168.0.1/24. I add the suggestion from https://regex101.com/ to escape the slash and it looks like this:

((^|\.)((25[0-5])|(2[0-4]\d)|(1\d\d)|([1-9]?\d))){4}\/(?:\d|[12]\d|3[01])$

This definitely seems to work on regex101.

It needs to live inside a json file (jsonschema file) but seems to contain something illegal. Can't work out what it is, have looked at this, this, this and also tried using ujson instead of json (in python) as suggested here, but nothing works.

the following piece of jsonschema which contains that regex:

{
    "comment": "ipv4 with a mask",
    "data": {
        "network": {
        }
    },
    "schema": {
        "$schema": "http://json-schema.org/draft-04/schema#",
        "title": "ipv4 with a mask",
        "type": "object",
        "properties": {
            "subnet": {
                "title": "subnet",
                "type": "string",
                "pattern": "((^|\.)((25[0-5])|(2[0-4]\d)|(1\d\d)|([1-9]?\d))){4}\/(?:\d|[12]\d|3[01])$"
            }
        }
    }
}

...unfortunately won't even parse. Python is saying:

JSONDecodeError: Invalid \escape: line 16 column 33 (char 380)

I have been using the library fastjsonschema to check these things, but can't even parse the json and get that far.

Does anyone know how to fix this, somehow get that piece of regex to function in jsonschema?

cardamom
  • 6,873
  • 11
  • 48
  • 102
  • 4
    Json doesn't like backslashes so you will have to escape them `((^|\\.)((25[0-5])|(2[0-4]\\d)|(1\\d\\d)|([1-9]?\\d))){4}\\\/(?:\\d|[12]\\d|3[01])$` Use a site like https://www.jsonschemavalidator.net to check your schemas – Tom Nov 08 '18 at 16:34
  • @TomPowis I just put your string into the json but the parser in python is still not happy. @IvanGodko was looking at that earlier today, but believe it will validate `192.168.0.1` but not `192.168.0.1/24` – cardamom Nov 08 '18 at 16:38
  • 1
    Maybe save it as a raw string? `r"..."` Then of course without the escaping – user8408080 Nov 08 '18 at 16:39
  • @TomPowis In your JSON-escaped regex string, there is one extra backslash after the `{4}`. – aneroid Nov 08 '18 at 21:13

2 Answers2

2

For JSON, you need to escape each backslash \ with another backslash:

((^|\\.)((25[0-5])|(2[0-4]\\d)|(1\\d\\d)|([1-9]?\\d))){4}\\/(?:\\d|[12]\\d|3[01])$

So in the JSON schema, it would look like:

"pattern": "((^|\\.)((25[0-5])|(2[0-4]\\d)|(1\\d\\d)|([1-9]?\\d))){4}\\/(?:\\d|[12]\\d|3[01])$"

The regex you found (in the link) doesn't match well with digit grouping anyway. Try it with a few examples - the full match is correct but the groups returned include the dots with the numbers or just dots.

If you want all the parts of the IP address and not just a full match, then here's a regex based on this one. I've included matching for an optional subnet mask:

^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
(?:\/(\d|[12]\d|3[01]))?$

(remove the linebreaks which I've added for readability.) Demo here. Only the first 3 addrs should match, not the rest.

And if you only want the full match, and not the individual parts, then use this:

^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
(?:\/(?:\d|[12]\d|3[01]))?$
aneroid
  • 12,983
  • 3
  • 36
  • 66
1

You won't believe it but 2 backslashes were not enough!

It does not work with 2 backslashes, it needs 3 or 4, so will go with 3. No need to give it more than it needs.

Had to spend a few more hours to realise this, but found this answer from @TimPietzcker which says:

You need to use escape the backslashes for the regex, and then escape them again for the string processor

So working code looks like this (tweaked the original schema slightly):

import json    
import fastjsonschema

schema = '''{
    "data": [{"subnet": "192.168.1.1/24"}],
        "$schema": "http://json-schema.org/draft-04/schema#",
        "type": "array",
        "items": {
            "type": "object",
            "properties": {
                "subnet": {
                    "title": "subnet",
                    "type": "string",
                    "pattern": "((^|\\\.)((25[0-5])|(2[0-4]\\\d)|(1\\\d\\\d)|([1-9]?\\\d))){4}\\\/(?:\\\d|[12]\\\d|3[01])$"
                }
            }
        }
    }''' 

schema = json.loads(schema)
validate = fastjsonschema.compile(schema)

def check_subnets(testcase):
    try: 
        validate([{"subnet": testcase}])
        print("yes a subnet")
    except fastjsonschema.JsonSchemaException:
        print("not a subnet")    

Then some tests:

>>> check_subnets("192.168.0.1/24") 
yes a subnet
>>> check_subnets("192.168.0.1/50")
not a subnet
>>> check_subnets("192.168.0.1")
not a subnet
>>> check_subnets("192.168.0.900/24")
not a subnet
cardamom
  • 6,873
  • 11
  • 48
  • 102
  • 1
    Wrt _"You need to use escape the backslashes for the regex, and then escape them again for the string processor"_: That's because you're putting your JSON string in Python and then loading it. If it was in a separate JSON/text file - _"It needs to live inside a json file (jsonschema file)"_ as per your question - you wouldn't need more than 1 extra backslash per regex-backslash. Which is where setting it as a 'raw' string with `r'some str'` comes in. Still think you'd need 3 _extra_ backslashes per regex-backslash without raw, so 4 total. Interesting that it works with 3. +1 for the follow up. – aneroid Nov 09 '18 at 16:12
  • Well good you have the patience for this kind of stuff, it's one of the most unpleasant problems have recently had to deal with.. Did not have much luck with `r'some str'` Will certainly look straight to number of backslashes if this is loaded from file once if goes into production or if it starts complaining about escaping again. – cardamom Nov 09 '18 at 16:23