0

Please don't reach for the "duplicate" gun just yet.

I need to generate a series of SQL statements involving literal strings that contain the occasional single quote. Yeah, I know that parametrized queries are the way to go. The thing is, I'm not communicating with a database directly: I'm generating an SQL script that will be used to load data on another computer. So, I don't mind issuing parametrized queries to my local database (mysql), but I'll need to output the complete SQL commands in text form. Can I do that in python? The only suggestions I saw on SO are hacks like using repr() or json.dumps(), or specific to psycopg. Surely that can't be all there is?

This application will never deal with untrusted data, so although I'd like the most robust solution possible, I'm not too worried about malformed unicode attacks and the like. What's the best way to do this?

Community
  • 1
  • 1
alexis
  • 48,685
  • 16
  • 101
  • 161
  • Do you need to output SQL commands or do you have any control about how your data is going to get treated? – syntonym Feb 23 '16 at 19:52
  • I have some control, but I want to use python for this particular task and I don't want to make it a project dependency on the other end. Also the data needs to be transformed, so I don't feel like transforming into an intermediate form and *then* writing an importer that will need to be distributed... – alexis Feb 23 '16 at 19:56
  • Just curious why you consider `repr()` a hack. Are there situations in which it will break? – gil Feb 23 '16 at 20:07
  • Also see [this so question](http://stackoverflow.com/questions/6775497/psycopg-get-formatted-sql-instead-of-executing) – syntonym Feb 23 '16 at 20:15
  • 1
    @gill, `repr()` targets python syntax, not sql. It adds double quotes that need to be stripped; it may escape unicode characters an a way SQL does not understand; etc. – alexis Feb 23 '16 at 20:20
  • Thanks. I didn't consider the unicode issue. (Stripping quotes not a big deal to me :) – gil Feb 23 '16 at 20:25

3 Answers3

1

You can subclass the psycopg2 cursor class to use mogrify to write the queries to a file instead of executing them against a database. You could probably also use it directly (and save you setting up a database etc.).

You could also use the query attribute for "playback" of a session.

syntonym
  • 7,134
  • 2
  • 32
  • 45
1

While generating complex SQLs may have it's difficulties, I love python for straight INSERTs. No tools, no libraries, just plain python solves all issues out of the box:

# syntactic sugar
def sq(s):
    return s.replace("'", "''")

# a third kind of string delimiters :3
sql_template = """INSERT INTO mytab ("the key", f1, f2) 
    VALUES (%d, '%s', '%s');"""

the_key = 7
f1 = "Hello World"
f2 = "That's ok"

# escape only string colums
sql = sql_template % (the_key, sq(f1), sq(f2))

print sql
flaschbier
  • 3,910
  • 2
  • 24
  • 36
  • I must admit, I can't come up with a good objection to `str.replace`. But somehow, after all the dire warnings, I'm reluctant to do my own quoting... – alexis Feb 24 '16 at 20:54
  • After all, if the input can be trusted, eventual issues will almost inevitably lead to import issues that will immediately be obvious. *And* your code will be easily peer reviewed, as opposed to the use of some fat library module (or the `awk` scripts we did in the stone age). Just saying ;) – flaschbier Feb 24 '16 at 21:07
  • 1
    I agree, and I upvoted your answer. But I don't want import issues, or silent data corruption, a year from now when I'm in a hurry... – alexis Feb 25 '16 at 08:42
0

The process for producing SQL is the same for any language. You need to first understand the entirety of the language (or data format; same difference here) that you want to produce. Failure to do this is the primary failing of most attempts: since they don't understand the language they don't handle all the input that can cause a problem. When you know the entirety of the language, including all corner cases and rarely used constructs, then you can create code that can understand all possible inputs and correctly represent them in that language. The easiest solution is to let someone else who has already dealt with it do the hard part, or to sidestep the entire mess by using a different representation. This is why everyone recommends parameterized queries. It pushes the responsibility to someone else who has already solved it, and usually they solve it by using a different representation in the protocol than the SQL language itself.

dsh
  • 12,037
  • 3
  • 33
  • 51
  • Yes, thank you so much for explaining all this. I'm happy to delegate SQL code generation to parametrized queries, but they don't generate SQL (that I can get a hold of)-- they just run the query. – alexis Feb 23 '16 at 20:01