I am trying to quickly load data from the internet into a table using Flask, SQLAlchemy with PostgreSQL/psycopg2. I am in a slight argument with a co-worker. We will call him "Dad." Dad is arguing that we cannot execute a raw SQL query because of the possibility SQL injection. I am arguing that we can, if it is probably formatted, which is tough to do and the ORM should generally be used. The following example seems to me to be a simple enough issue.
# Flask-SQLAlachemy
from flask import Flask, render_template
from flask_sqlalchemy import SQLAlchemy
server = Flask(__name__)
server.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
server.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql+psycopg2://...'
db = SQLAlchemy(server)
@server.route('/<column>')
def index(column):
result = db.session.execute("SELECT" + column + "from item_profit")
return render_template('index.html', data=result)
Someone could essentially insert any Raw SQL into column and return that table. See this link for other simple injections. I saw this SO answer (see second answer) that seems to imply a properly formatted string will not lead to an SQL injection. Their code looks like so:
result = db.session.execute('SELECT * FROM my_table WHERE my_column = :val', {'val': 5})
Dad says that DBA at quite a few companies prevent other departments from having execute authority ie they would not be able to use
result = db.session.execute("...")
I think that is because they are worried a regular user will not know to properly format it. Is that correct? It is not that executing a SQL query can cause an SQL injection but that an improperly formatted execute query can cause one. This is why people say to use the ORM because it ensures properly formatted queries all the time. Is this correct thinking? ORM systems are just performing the execute behind the scenes. If executing an query was an injection issue, then even ORMs could not be used. Heck, even SQL could not be used, correct? I hope to use raw SQL in certain cases, as we might be dealing with huge datasets, which processing speed is a major factor and the ORM is much, much slower.
My followup question then, what is the best way to load a CSV into a SQL database while minimizing raw SQL? Dad imported the CSV into postgres using a PgAdmin import (saves a lot of time having to write out the import in psql). I know another way would be to use pandas pandas.Dataframe.to_sql() method, but I do not see bringing pandas in for this seems like the best design here. I typically use the ORM, but I am not sure if there is a quick way of getting the CSV this way. Let's say I had a csv of a table, Product
, with two columns, Name
, and Cost
. To make this in the ORM, I would do the following:
...
db = SQLAlchemy(app)
class Product(db.Model):
__tablename__ = 'product'
name = db.Column(db.String)
cost = db.Column(db.Double)
However, I am back to the issue of getting the CSV into python to load into the table. There is pandas from_csv(''). I guess I could also open the file itself and loop through it. Something like
for record in csv_file:
product = Product(name=..., cost=...)
db.session.add(product)
db.session.commit()
Is there a better way? Does the ORM have a way to load a CSV while allowing me to avoid raw SQL for most of the time?