If I understand your requirements correctly, some of the special "chars" are two-character strings (specifically: "&&" and "||"). The best way to do such an odd collection is with a regular expression. You can use a character class to match anything that is one character long, then use vertical bars to separate some alternative patterns, and these can be multi-character. The trickiest part is the backslash-escaping of chars; for example, to match "||" you need to put r'\|\|' because the vertical bar is special in a regular expression. In a character class, backslash is special and so are '-' and ']'. The code:
import re
_s_pat = r'([\\+\-!(){}[\]^"~*?:]|&&|\|\|)'
_pat = re.compile(_s_pat)
def escape_query(query):
return re.sub(_pat, r'\\\1', query)
I suspect the above is the fastest solution to your problem possible in Python, because it pushes the work down to the regular expression machinery, which is written in C.
If you don't like the regular expression, you can make it easier to look at by using the verbose format, and compile using the re.VERBOSE
flag. Then you can sprawl the regular expression across multiple lines, and put comments after any parts you find confusing.
Or, you can build your list of special characters, just like you already did, and run it through this function which will automatically compile a regular expression pattern that matches any alternative in the list. I made sure it will match nothing if the list is empty.
import re
def make_pattern(lst_alternatives):
if lst_alternatives:
temp = '|'.join(re.escape(s) for s in lst_alternatives)
s_pat = '(' + temp + ')'
else:
s_pat = '$^' # a pattern that will never match anything
return re.compile(s_pat)
By the way, I recommend you put the string and the pre-compiled pattern outside the function, as I showed above. In your code, Python will run code on each function invocation to build the list and bind it to the name special_chars
.
If you want to not put anything but the function into the namespace, here's a way to do it without any run-time overhead:
import re
def escape_query(query):
return re.sub(escape_query.pat, r'\\\1', query)
escape_query.pat = re.compile(r'([\\+\-!(){}[\]^"~*?:]|&&|\|\|)')
The above uses the function's name to look up the attribute, which won't work if you rebind the function's name later. There is a discussion of this and a good solution here: how can python function access its own attributes?
(Note: The above paragraph replaces some stuff including a question that was discussed in the discussion comments below.)
Actually, upon further thought, I think this is cleaner and more Pythonic:
import re
_pat = re.compile(r'([\\+\-!(){}[\]^"~*?:]|&&|\|\|)')
def escape_query(query, pat=_pat):
return re.sub(pat, r'\\\1', query)
del(_pat) # not required but you can do it
At the time escape_query()
is compiled, the object bound to the name _pat
will be bound to a name inside the function's name space (that name is pat
). Then you can call del()
to unbind the name _pat
if you like. This nicely encapsulates the pattern inside the function, does not depend at all on the function's name, and allows you to pass in an alternate pattern if you wish.
P.S. If your special characters were always a single character long, I would use the code below:
_special = set(['[', ']', '\\', '+']) # add other characters as desired, but only single chars
def escape_query(query):
return ''.join('\\' + ch if (ch in _special) else ch for ch in query)