3

For configuration purposes, if I store an "easy" regex in a JSON file and load it into my Python program, it works just fine.

{
    "allow": ["\/word\/.*"],
    "follow": true
},

If I store a more complex regex in a JSON file, the same Python program fails.

{
    "allow": ["dcp\=[0-9]+\&dppp\="],
    "follow": true
},

That's the code that loads my JSON file:

src_json = kw.get('src_json') or 'sources/sample.json'
self.MY_SETTINGS = json.load(open(src_json))

and the error is usually the same, pointing my online searches to the fact, that regular expressions should not be stored in JSON files.

json.decoder.JSONDecodeError: Invalid \escape: line 22 column 38 (char 801)

YAML files seem to have similar limitations, so I shouldn't got down that way I guess.

Now, I've stored my expression inside a dict in a separate file:

mydict = {"allow": "com\/[a-z]+(?:-[a-z]+)*\?skid\="}

and load it from my program file:

exec(compile(source=open('expr.py').read(), filename='expr.py', mode='exec'))

print(mydict)

Which works and would be fine with me - but it looks a bit ... special ... with exec and compile.

Is there any reason not to do it in this way? Is there a better way to store complex data structures and regular expressions in external files which I can open / use in my program code?

martineau
  • 119,623
  • 25
  • 170
  • 301
Chris
  • 1,265
  • 4
  • 18
  • 37
  • Consider using Python's `pickle` module. Also see [Making object JSON serializable with regular encoder](https://stackoverflow.com/questions/18478287/making-object-json-serializable-with-regular-encoder). – martineau Jan 27 '19 at 13:53
  • Note that you should use the `r` string prefix for regex expressions (and not doing so might actually be the cause of some of your storage problems). – martineau Jan 27 '19 at 13:58

3 Answers3

2

The link you indicate is the JSON specification. It doesn't say anything about regular expressions as far I can tell.

What you seem to be doing is taking a working regular expression and pasting that in your JSON file for (re-) use. And that is not always working because some of the things have to be escaped for the JSON to be valid.

There is however a simple way of inserting the regular expression into the JSON file, with the appropriate escapes, by making a small Python program that will take the regular expression as a commandline parameter and then json.dump() the JSON file, or, alternatively, load-update-dump the file with the new regular expression.

Anthon
  • 69,918
  • 32
  • 186
  • 246
1

First, regular expressions can be stored as JSON, but need to be stored as valid JSON. This is the cause of your JSONDecodeError in the example.

There are other answers here on SO, that explain how to properly encode/decode regex as valid JSON, such as: Escaping Regex to get Valid JSON

Now, the other pieces of your question start to go into more best practices and opinions.

As you've seen, you certainly can declare & use variables from other files:

test_regex.py

my_dict = {'allow': 'com\\/[a-z]+(?:-[a-z]+)*\\?skid\\='}

script.py

from test_regex import mydict
mydict
{'allow': 'com\\/[a-z]+(?:-[a-z]+)*\\?skid\\='}

However, this is a pretty different feeling use case. In our JSON example, the information is set in a way that we expect it to be more easily configurable - different JSON files could be used (perhaps for different environment configurations) each with different regex. In this example, we don't assume configurability but instead the test_regex is used for separation of concerns and readability.

kyle
  • 691
  • 1
  • 7
  • 17
0

If you're storing your dictionary in a .py file you can import the variable directly as long as the file can be found in your PYTHONPATH or you use a relative import.

For instance if I make a .py file called expr.py and PYTHONPATH includes the folder it's in.

The contents of the file (same as your example):

mydict = {"allow": "com\/[a-z]+(?:-[a-z]+)*\?skid\="}

Then I can run this from an interpreter or another script

>>> from expr import mydict
>>> mydict
{'allow': 'com\\/[a-z]+(?:-[a-z]+)*\\?skid\\='}

No need to mess around with open() and exec unless i'm missing something here. I use this approach to storing regexes because you can store re.compile objects directly.

if I change the file to:

import re
mydict = {"allow": re.compile(r"com\/[a-z]+(?:-[a-z]+)*\?skid\=")}

I can do:

>>> from expr import mydict
>>> print(mydict)
{'allow': re.compile('com\\/[a-z]+(?:-[a-z]+)*\\?skid\\=')}
>>> print(mydict["allow"].pattern)
com\/[a-z]+(?:-[a-z]+)*\?skid\=
>>> print(mydict["allow"].match("com/x-x?skid="))
<_sre.SRE_Match object; span=(0, 13), match='com/x-x?skid='>

if the file has a silly amount of regexes the automatic sorting of the variables under the script name could help with organisation too:

file:

import re
mydict = {"allow": re.compile(r"com\/[a-z]+(?:-[a-z]+)*\?skid\=")}
easydict = {"allow": re.compile(r"\/word\/.*"), "follow": True}
complexdict = {"allow": re.compile(r"dcp\=[0-9]+\&dppp\="), "follow": True}

interpreter:

>>> import expr
>>> print(expr.easydict["allow"].pattern)
\/word\/.*
>>> print(expr.complexdict["allow"].match("dcp=11&dppp="))
<_sre.SRE_Match object; span=(0, 12), match='dcp=11&dppp='>
>>> print(expr.mydict)
{'allow': re.compile('com\\/[a-z]+(?:-[a-z]+)*\\?skid\\=')}
Zhenhir
  • 1,157
  • 8
  • 13