-1

Given a string with bindings:

input = "SELECT * FROM table WHERE col1 = %s AND col2 = %(col2_value)s"

I want to convert %s to ? and %(named)s to @named, such that I get:

input = "SELECT * FROM table WHERE col1 = ? AND col2 = @col2_value"

My current implementation uses re to swap the two types and return the list of bindings:

def _convert_bindings(query: str) -> Tuple[str, List[Union[None, str]]]:
    pos_rex = r"%s"
    named_rex = r"%\((\w+)\)s"
    bindings = [
        None if m.string == "%s" else re.search(r"%\((?P<name>\w+)\)s", m.string).group("name")
        for m in re.finditer(f"{pos_rex}|{named_rex}", query)
    ]
    sql = re.sub(pos_rex, r"?", query)
    sql = re.sub(named_rex, r"@\1", query)

    return sql, bindings

But I figured Python must have a builtin way to find & replace parameters, since str.format does it already.

So what's the most pythonic way to extract %s and %(named)s, and subsequently replace them? Is there a function/property that could be named something like "str.params" or a function getparams from a builtin module?

Assume no prior knowledge of the list of named parameters.

Philippe Hebert
  • 1,616
  • 2
  • 24
  • 51
  • 4
    May I ask WHY you are doing this? Are you switching from one database backend to another? – Tim Roberts Sep 24 '21 at 20:35
  • 2
    For what you describe, this solution seems fine. Any modifications to make the code "more pythonic" will likely not improve performance here. You already have a list comprehension and the rest of the code is just using the regular expression module, so I think this is fine. – h0r53 Sep 24 '21 at 20:38
  • I'd agree to use variable binding, but if you have the problematic string format queries in the code base, I'd think it's a manual refactoring effort to go through and fix all those queries. Or what is your use case? – Robert Sep 24 '21 at 20:38
  • @Tim Roberts + Robert: I use psycopg2 with native bindings in the code, and I parse the SQL using pglast in the tests to ensure that I am querying the expected source table. pglast does not support native bindings, only the PG bindings. – Philippe Hebert Sep 24 '21 at 21:34

1 Answers1

0
p = re.compile("%\(([^)]*)\)s")

input = "SELECT * FROM table WHERE col1 = %s AND col2 = %(col2_value)s"

query = input.replace("%s", "?")
query = p.sub(r"@\1" , query)
  
print(query)

# SELECT * FROM table WHERE col1 = ? AND col2 = @col2_value
Srdjan Grubor
  • 2,605
  • 15
  • 17
  • Can you explain why this is more pythonic than the code shown in the question? – mkrieger1 Sep 24 '21 at 21:08
  • 1
    @mkrieger1 It's only more Pythonic in design aspect of it by being easier to maintain and as simple to read/understand as you can have it. Doing things "by hand" using iteration where none is needed is the quintessential Pythonic anti-pattern. [The Zen of Python](https://legacy.python.org/dev/peps/pep-0020/) – Srdjan Grubor Sep 24 '21 at 21:14
  • Edit: this solution will also be _much_ faster as the pattern is compiled and persisted vs using the triple-arg `re.sub` method in the original. – Srdjan Grubor Sep 24 '21 at 21:17
  • The `re` module should cache compiled patterns automatically. – mkrieger1 Sep 24 '21 at 21:18
  • [It doesn't](https://docs.python.org/3/library/re.html#re.compile). You **have** to compile the expression and reuse the result for caching to take place. [A real-world example of this](https://github.com/DataDog/datadogpy/pull/672) showing 30% speed increase. – Srdjan Grubor Sep 24 '21 at 21:22
  • Generally caching expilictly occurs when you use a compiled pattern and then do `pattern.sub(replacement, input)`. This does not happen explicitly when you do `re.sub(search_pattern, replacement, input)` which is what the original poster's code is doing. See my real-world example link in previous comment. Edit: I see what you meant now but it's implicit vs explicit. I guess the module-level cache is super small (100 entries) but it would be fine in the original poster's snippet: https://stackoverflow.com/a/17325325/1121879 – Srdjan Grubor Sep 24 '21 at 21:28
  • @Srdjan Grubor, thanks for a simpler, cleaner implementation. It is indeed more pythonic because it is easier to read/understand. This being said, your impl seems to be more pythonic only because it doesn't solve the exact same problem as the function above, which returns also a list of the bindings separately. The question is really about better understanding builtin solutions to obtain c-style format parameters in a string without manual parsing. – Philippe Hebert Sep 24 '21 at 21:37
  • @PhilippeHebert That's fair - I did not realize that the `bindings` part was an important part of the question. It might be good to edit the text to highlight that it's not just about converting the string from one form to another. – Srdjan Grubor Sep 24 '21 at 21:43