Use re.sub
:
In [1]: import re
In [2]: text = '{ query: { and: [ { and: [ { _t: "Manifest" }, { or: [ { and: [ { _i: { gt: "53b2616fe4b028359ac3fea4" } } ] } ] }, { _s: "active" } ] }, { ENu_v: { elemMatch: { EOJ_v: { in: [ "*", "Production", "QA " ] } } } } ] }, orderby: { _i: 1 } } '
In [3]: re.sub('(\w+):', r'"\1":', text)
Out[3]: '{ "query": { "and": [ { "and": [ { "_t": "Manifest" }, { "or": [ { "and": [ { "_i": { "gt": "53b2616fe4b028359ac3fea4" } } ] } ] }, { "_s": "active" } ] }, { "ENu_v": { "elemMatch": { "EOJ_v": { "in": [ "*", "Production", "QA " ] } } } } ] }, "orderby": { "_i": 1 } } '
Note that you have to use a raw-string literal (or escape \1
as \\1
) for the replacement text, otherwise you wont get your expected output.
I have assumed that your text doesn't contain "strange" things like:
- colons inside a value (e.g.
{a: "some:string"}
; the "some:string"
isn't preserved by this solution)
- complex strings that contain nested structure (e.g.
{a: "{b : \"hello\"}"}
)
If these assumptions don't hold you have to actually parse the text, and you cannot safely transform it using regexes alone.
The ast
module together with the codegen
third party module makes it easy to manipulate such data. For example you can create a NodeTransformer
subclass such as:
class QuoteNames(ast.NodeTransformer):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._inside_dict = False
def visit_Name(self, node):
if self._inside_dict:
return ast.copy_location(ast.Str(node.id), node)
else:
return node
def visit_Dict(self, node):
self._inside_dict = True
self.generic_visit(node)
self._inside_dict = False
return node
And use it as:
import ast, codegen
codegen.to_source(QuoteNames().visit(ast.parse(text))
However your sample text is not a syntactically valid literal because some brackets aren't well-matched (which is probably an error in your example), there are some string values with missing ending quotes and you cannot use and
or or
in identifiers.
If you can fix the format to match the python syntax then the above solution is much more robust than the one using regexes. However if this is not possible you'd have to write your own parser for it, or look for a third party module that is able to do that.