0

I have a kinda unusual scenario but in addition to my sql parameters, I need to let the user / API define the table column name too. My problem with the params is that the query results in:

SELECT device_id, time, 's0' ...

instead of

SELECT device_id, time, s0 ...

Is there another way to do that through raw or would I need to escape the column by myself?

queryset = Measurement.objects.raw(
            '''
            SELECT device_id, time, %(sensor)s FROM measurements
            WHERE device_id=%(device_id)s AND time >= to_timestamp(%(start)s) AND time <= to_timestamp(%(end)s)
            ORDER BY time ASC;
            ''', {'device_id': device_id, 'sensor': sensor, 'start': start, 'end': end})
Yannic Hamann
  • 4,655
  • 32
  • 50
  • Allowing users to specify schema elements (as opposed to query values) is often a cause to use string concatenation, even though we're all so often told to "never do that". The good news is that the superset of schema elements is finite. So you don't actually have to include unescaped user input. You can create a list of "valid" user inputs based on the known schema elements (either hard-coded or dynamically queried from the database), compare the user input with that list, and if a match is found then concatenate that matched value into the raw SQL. – David Oct 10 '17 at 14:05
  • That's a good idea I could actually achieve that with a simple regex. – Yannic Hamann Oct 10 '17 at 14:11
  • 1
    Oh, in that case if this is what you're looking for then I can write this up as an answer. – David Oct 10 '17 at 14:12

2 Answers2

1

As with any potential for SQL injection, be careful.

But essentially this is a fairly common problem with a fairly safe solution. The problem, in general, is that query parameters are "the right way" to handle query values, but they're not designed for schema elements.

To dynamically include schema elements in your query, you generally have to resort to string concatenation. Which is exactly the thing we're all told not to do with SQL queries.

But the good news here is that you don't have to use the actual user input. This is because, while possible query values are infinite, the superset of possible valid schema elements is quite finite. So you can validate the user's input against that superset.

For example, consider the following process:

  1. User inputs a value as a column name.
  2. Code compares that value (raw string comparison) against a list of known possible values. (This list can be hard-coded, or can be dynamically fetched from the database schema.)
  3. If no match is found, return an error.
  4. If a match is found, use the matched known value directly in the SQL query.

So all you're ever using are the very strings you, as the programmer, put in the code. Which is the same as writing the SQL yourself anyway.

David
  • 208,112
  • 36
  • 198
  • 279
0

It doesn't look like you need raw() for the example query you posted. I think the following queryset is very similar.

measurements = Measurement.objects.filter(
    device_id=device_id, 
    to_timestamp__gte=start,
    to_timestamp__lte,
).order_by('time')

for measurement in measurements:
    print(getattr(measurement, sensor)

If you need to optimise and avoid loading other fields, you can use values() or only().

Alasdair
  • 298,606
  • 55
  • 578
  • 516
  • Thanks for your reply and I will upvote this. I am using the postgres extension timescaledb and I need to include the time_bucket (which is very similar to date_trunc) in the future so I guess (but I haven't been to that point yet) I need to stick with the raw option. – Yannic Hamann Oct 10 '17 at 14:18
  • You might be able to annotate the time_bucket instead of using `raw()`. See the [docs on expressions](https://docs.djangoproject.com/en/1.11/ref/models/expressions/). – Alasdair Oct 10 '17 at 14:21
  • I try to solve that problem here: https://stackoverflow.com/questions/46688158/annotate-custom-sql-function-similar-to-date-trunc-to-django-orm-queryset – Yannic Hamann Oct 11 '17 at 13:43