0

What is the best way to sanitize a SQL to prevent injection when using python? I'm using mysql-connector. I have read that I should use a structure similar to:

import mysql.connector

connection = mysql.connector.connect(host="", port="", user="", password="", database="")
cursor = connection.cursor( buffered = True )

sql = "INSERT INTO mytable (column1, column2) VALUES (%s, %s)"
val = (myvalue1, myvalue2)

cursor.execute(sql, val)
connection.commit()

However, I don't understand why this can prevent an injection. Is this sufficient? A user could introduce me anything on myvalue1 or myvalue2, even if it is not suposed to. Is there any useful library?

Learning from masters
  • 2,032
  • 3
  • 29
  • 42
  • 1
    Sanitization is the wrong approach entirely. Stop mixing data and code at all; use bind variables to keep data out of band. – Charles Duffy Mar 24 '22 at 14:58
  • 1
    And you _are_ doing that already, so you're fine. Better designed databases, which MySQL last I knew was not, even keep data and parameters separate from each other at the wire protocol level, and beyond to the parser. – Charles Duffy Mar 24 '22 at 14:59
  • 1
    @Charles You're somewhat outdated there, MySQL supports prepared statements natively just fine: https://dev.mysql.com/doc/refman/8.0/en/sql-prepared-statements.html – deceze Mar 24 '22 at 15:05
  • @deceze, there are special considerations for the Python connector. By default, it _simulates_ prepared statements, but really does string interpolation. You must use a subclass of the cursor to get a real prepared statement. – Bill Karwin Mar 24 '22 at 15:09
  • I'll never claim to be up-to-date on MySQL; got fed up with it after a series of data loss events in the early 2000s (after already having a chip on my shoulder from having to maintain a system built by someone who believed its authors' claims about relational and transactional integrity being expensive and unnecessary luxuries in the late 90s -- claims they notably dropped as soon as they had working foreign key enforcement and a suitable transactional model themselves); so I've been in the PostgreSQL world ever since. – Charles Duffy Mar 24 '22 at 15:12
  • 1
    @CharlesDuffy Most technology can improve, given 20 years (exception: Microsoft Windows). And all technology has warts -- even PostgreSQL. – Bill Karwin Mar 24 '22 at 15:33
  • @BillKarwin, I believe in technology improving a lot more than I do in people improving. Publicly defending the decision to punt on data integrity features in something presented to the public as a production of a release of a database is an act I consider neigh-unforgivable, with respect to future public trust. Now, if there was a public apology that I missed, that might change things. – Charles Duffy Mar 24 '22 at 15:36
  • Aaand we're out of time. How does your next Tuesday at this same time sound? – Bill Karwin Mar 24 '22 at 15:41

1 Answers1

1

SQL injection works when untrusted input is interpolated into an SQL query and the input contains characters that change the syntax of the query.

Query parameters are kept separate from the SQL query, never interpolated into it. The values of the parameters are combined with the SQL query after it is parsed, so there is no longer any opportunity to change the syntax. The parameter is guaranteed to be treated as a single scalar value (i.e. as if it's just a string literal in an SQL expression).

This is the way the Python connector works if you use the MySQLCursorPrepared cursor subclass. See https://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursorprepared.html

Otherwise, the Python connector "simulates" prepared queries. It actually does interpolate parameters into the SQL query before it is parsed, but it does so safely, by escaping special characters that would cause SQL injection. It is well-tested so it's reliable.

Both cursor types are used the same way, passing an SQL query string with %s placeholders, and another argument with a tuple of parameter values. You are using it correctly.


Re comment from @Learningfrommasters:

Yes, a string stored in your database can be used unsafely in another SQL query, and cause SQL injection. Some people think that only user input must be treated safely, but this is not true. Any variable should be treated as a query parameter, whether the value for that variable comes from user input, or read from a file, or even pulled out of your own database.

Example: Suppose my name is Bill O'Karwin. It has an apostrophe in it, which you know is a special character to SQL because it terminates a string literal.

If my name were stored in the database and then fetched into an application into a variable userlastname, then I could search for other people with the same last name:

sql = f"SELECT * FROM Users WHERE lastname = '{userlastname}'"

That is unsafe because the apostrophe would cause SQL injection. Even though the value didn't come directly from user input, it came from my own database.

So use parameters for all variables. Then you don't have to think about whether the source is safe or not.

sql = "SELECT * FROM Users WHERE lastname = %s"
cur.execute(sql, (userlastname,))
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • I have one question then: Imagine someone introduces a bad value, which can cause an injection. Because I did the previous solution, it won't make an injection. However, it will be stored as a string. If later I have to do a select on that table, would that string be able to inject anything? – Learning from masters Mar 24 '22 at 15:04
  • @Learning Only if you handle that stored value incorrectly later on: https://en.wikipedia.org/wiki/SQL_injection#Second_order_SQL_injection – deceze Mar 24 '22 at 15:06
  • 1
    @Learningfrommasters, trying to "sanitize" data on the way in means you're putting all your defenses in one place, and if something gets past them the battle is lost. The industry has a long history of trying to fight that perimeter battle, and losing over and over (especially when the display layer, and its sanitization needs, change over time). Whereas if you make every place you handle or render data treat it in a safe, contextually-appropriate matter, the perimeter battle becomes moot. It's very much like an analogue to the zero-trust vs firewalls-for-everything models. – Charles Duffy Mar 24 '22 at 15:16
  • @Learningfrommasters, ...going back to what I said about sanitization needs and display layers: Sanitizing something to be safe when substituted into JSON is different from sanitizing it to be safe when substituted into HTML, which is different from sanitizing it to be safe when substituted into SQL, etc etc. You can't fight all those battles at once and win; better to make sure your HTML rendering escapes data as HTML, your code substituting data into JSON escapes it as JSON, etc -- then it's not the input layer's job. – Charles Duffy Mar 24 '22 at 15:18
  • @Learningfrommasters, ...I could write essays, and I have, about how even very large industry players are learning the wrong lessons from attacks against them and responding in ineffective ways. Suffice to say that the paragraphs above reflect expensively-learnt experience. – Charles Duffy Mar 24 '22 at 15:20