81

What are the best practices for mitigating SQL injection attacks when using SQLAlchemy?

Mike
  • 58,961
  • 76
  • 175
  • 221

4 Answers4

110

tldr: Avoid raw SQL as much as possible.

The accepted answer is lazy and incorrect. The filter method accepts raw SQL, and if used in that way, is fully susceptible to SQL injection attacks. For instance, if you were to accept a value from a url and combine it with raw sql in the filter, you are open to attack:

session.query(MyClass).filter("foo={}".format(getArgs['val']))

using the above code and the below url, you would be injecting SQL in to your filter statement. The code above would return all rows in your database.

URL encoded:

https://example.com/?val=2%20or%201%20=%201

Easier to understand (URL decoded):

https://example.com/?val=2 or 1 = 1
galoget
  • 722
  • 9
  • 15
Tendrid
  • 1,257
  • 1
  • 8
  • 5
  • 8
    "unless you deliberately bypass SQLAlchemy's quoting mechanisms..." Yes entering raw sql is deliberately bypassing that quoting mechanism. So no, the above answer is not incorrect. – Johnston Jan 01 '14 at 12:51
  • 33
    I disagree. That you can pass raw sql to the filter method is part of sqlalchemy, not some end-around hack...so its worth noting here as something to be aware of. – Mike Jan 18 '14 at 22:09
  • 3
    If I have to take in user input for a filter, what's the correct way to make sure the user isn't entering raw SQL to drop tables or any other unexpected behavior? – divide_by_zero Sep 03 '15 at 06:40
  • 1
    @divide_by_zero use the orm filter methods, thats what they're for. Never use raw sql. – Tendrid Nov 20 '15 at 18:35
  • 2
    @divide_by_zero well use this `session.query(MyClass).filter(MyClass.foo == "{}".format(getArgs['val']))` This will probably throw psycopg2.InternalError invalid syntax if you try to inject something – Ja8zyjits Jan 12 '16 at 10:52
  • 1
    Why on earth would you do this instead of `session.query(MyClass).filter(MyClass.foo==getArgs['val'])` or `session.query(MyClass).filter_by(foo=getArgs['val'])`? – Mark Apr 13 '18 at 13:26
  • I agree @Mark I find this answer somewhat misleading if useful as a warning. I don't understand why anyone would do this. – Konrad Jul 19 '21 at 21:42
  • 1
    Its not about what someone would or wouldn't do, i wrote this answer years ago to simply outline what you _could_ and _shouldn't_ do. The only way to answer the question stated by OP was to explain the only way to actually inject SQL. All of that being said, I have _absolutely_ seen developers build complex SQL strings, then pass them in to the filter function, leading to SQLi. – Tendrid Jul 21 '21 at 00:17
58

If you have any "special" characters (such as semicolons or apostrophes) in your data, they will be automatically quoted for you by the SQLEngine object, so you don't have to worry about quoting. This also means that unless you deliberately bypass SQLAlchemy's quoting mechanisms, SQL-injection attacks are basically impossible.

[per http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html]

Carson Ip
  • 1,896
  • 17
  • 27
  • 50
    The answer says that the quote comes from "the" documentation, when it doesn't: it seems to come from [a tutorial](http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html) not associated with SQLAlchemy. Second, the quote is in context of part of the SQLAlchemy API that will correctly handle escaping, using an example that handles escaping. However, you can still use `execute()` or other literal data that will NOT be escaped by SQLAlchemy. Yes, in MOST cases SQLAlchemy will auto-escape, but if you are using literals or raw SQL, you can still shoot yourself in the foot. – Mark Hildreth Jan 16 '14 at 17:16
  • 1
    I'm looking to find the specific lines of code in the SQLAlchemy code repo that substantiates what @carson ip references as SQLAlchemy's "quoting mechanisms" Any tips? – emalcolmb Nov 08 '21 at 23:19
16

To add to @Tendrid answer. I did a little investigation using quiet naive approach. filter method has *criterion as its argument, several other ORM Query methods have similar argument.

In case of filter method *criterion argument ends up passed into _literal_as_text, which in case of string - marks it as safe sql (please correct me if I'm wrong). Therefore it makes it unsafe.

Here is outcome of ORM Query class method investigation with *criterion argument:

filter   - uses _literal_as_text (NOT SAFE)
having   - uses _literal_as_text (NOT SAFE)

distinct - uses _literal_as_label_reference (NOT SAFE)
group_by - uses _literal_as_label_reference (NOT SAFE)
order_by - uses _literal_as_label_reference (NOT SAFE)

join     - uses model attributes to resolve relation (SAFE)

Examples of possible method missuses (to keep it simple, string formatting is skipped):

db.session.query(User.login).group_by('login').having('count(id) > 4; select name from roles').all()
db.session.query(User.login).distinct('name) name from roles /*').order_by('*/').all()
db.session.query(User.login).order_by('users_login; select name from roles').all()
db.session.query(User.login).group_by('login union select name from roles').all()

Note that these methods are only unsafe if string literal is passed.

aisbaa
  • 9,867
  • 6
  • 33
  • 48
4

I'm inclined to agree with @Tendrid's answer.

If you write this:

session.query(MyClass).filter("foo={}".format(getArgs['val']))

... you are creating an injection vulnerability.

SqlAlchemy's approach is to use bound parameters to avoid these injection attacks. The way you're meant to use filter() is to write:

session.query(MyClass).filter(MyClass.foo == getArgs['va'])

As SqlAlchemy has overloaded python's operators like == to escape the SQL correctly (and avoid injection). See here

There is a warning about this buried in the documentation of SqlAlchemy here that says:

Always use bound parameters

As mentioned at the beginning of this section, textual SQL is not the usual way we work with SQLAlchemy. However, when using textual SQL, a Python literal value, even non-strings like integers or dates, should never be stringified into SQL string directly; a parameter should always be used. This is most famously known as how to avoid SQL injection attacks when the data is untrusted. However it also allows the SQLAlchemy dialects and/or DBAPI to correctly handle the incoming input for the backend. Outside of plain textual SQL use cases, SQLAlchemy’s Core Expression API otherwise ensures that Python literal values are passed as bound parameters where appropriate.

And there's a section in the glossary on bound parameters here

It says:

Bound parameters are the primary means in which data is passed to the DBAPI database driver. While the operation to be invoked is based on the SQL statement string, the data values themselves are passed separately, where the driver contains logic that will safely process these strings and pass them to the backend database server, which may either involve formatting the parameters into the SQL string itself, or passing them to the database using separate protocols.

The specific system by which the database driver does this should not matter to the caller; the point is that on the outside, data should always be passed separately and not as part of the SQL string itself. This is integral both to having adequate security against SQL injections as well as allowing the driver to have the best performance.

Basically that means this:

session.query(MyClass).filter("foo={}".format(getArgs['val']))

... is broken because you're passing the data to the filter() together with the SQL statement foo=<data>.

You're meant to always keep the statement and data separate, i.e.:

session.query(MyClass).filter(MyClass.foo == getArgs['va'])

or

session.query(MyClass).filter_by(foo=getArgs['va'])

As then SqlAlchemy can work it's magic and do the escaping with bound parameters.

Donal
  • 8,430
  • 3
  • 16
  • 21
  • "SqlAlchemy has overloaded python's operators like == to escape the SQL correctly" - no - SQLAlchemy delegates escaping of values to the underlying DB-API connection. SQLAlchemy constructs the SQL statement with placeholders for values, then passes the statement and values to the DB-API connections `cursor.execute` method. This is what the glossary quote is saying. – snakecharmerb Jan 20 '22 at 15:36
  • 1
    Erm, well. Yes, your comment is more complete. But you're proving the correctness of what I said. If SqlAlchemy hadn't overloaded the python operators like ==, then Python would evaluate the == before a method like filter() receives anything. And it would result in a boolean. The boolean won't result in a useful SQL query. So SqlAlchemy has overloaded the python operators like == to return Sql statements wrapped in python objects. That allows it to do escaping whenever it passes those statements down to the DB-API. We could edit my answer to include what you've said because it's informative. – Donal Jan 21 '22 at 11:36