1

I'm running Python pyodbc against SQL Server. I have a very complex query that I minimize here as

DECLARE @mydate DATETIME = '2021-02-08'

SELECT * 
FROM atable 
WHERE adate = @mydate AND afield = ?

On the Python side I'm executing the usual

crsr.execute(sql, field)

What is baffling me is that it returns all the results and it ignores the condition afield = field with no other errors but with a strange order so that when I plot the graph it is very confused! Why does it happen? (edit Of course I should have added an ORDER BY)

I have already fixed the code with an initial

DECLARE @myfield VARCHAR(32) = ?

followed by the where condition ending with afield=@myfield and now it works as expected: the order is the normal one even if I have not introduced an explicit ORDER BY.

In other words, aside from the fact that the final correct fix is adding an ORDER BY, e.g.

SELECT * 
FROM atable 
WHERE adate = @mydate AND afield = ?
ORDER BY id

I'm wondering why introducing the above said change was enough to change the order.

  • I am unable to reproduce your issue. [This code](https://pastebin.com/EXXcSzU6) works fine for me. – Gord Thompson Feb 08 '21 at 19:18
  • @GordThompson you are right, my question, as it is written now, is not capturing the issue... It needs to be edited somehow, but I'm still trying to minimize all the code but without eliminating some relevant part... Sorry I hope to be able to edit it soon –  Feb 08 '21 at 20:18
  • 1
    **SQL tables have no inherent order**, you may get the same order back every time, but this is absolutely not guaranteed, and depends on indexing and statistics, as well as parallelism (which is inherently unstable). If you want a specific order, use `ORDER BY` – Charlieface Feb 08 '21 at 21:00
  • @Charlieface yes, sure, maybe the question can be closed or I can simply delete it, The only reason to keep it open would be to discuss why the interpreter chooses two different orders for practically the same sql. –  Feb 08 '21 at 21:05
  • If you compare the query plans and look at your indexing you will probably see why. We don't have that info in front of us so cannot say. See also https://use-the-index-luke.com/ – Charlieface Feb 08 '21 at 21:24
  • @Charlieface just in principle, why should the query plan change if I declare a variable via a question mark and I use the declared variable or I directly put the question mark in the where clause? I don't think the query plan would change... but ok, I'll report it in the question.if I find a difference –  Feb 08 '21 at 21:30
  • Simple: because the variable means it has no visibility on what the value might be, and has to base the statistics off the average density, this may be better or worse, but won't change from run to run. It looks like Python is passing in `?` variables as text, which is a bad idea. Is there no way to pass in a proper parameter? – Charlieface Feb 08 '21 at 21:37
  • @Charlieface ok, that is enough as an answer for me if you want to post it as such... And no I think that pyodbc only supports this kind of parameters, –  Feb 08 '21 at 21:41

1 Answers1

0

Because your SQL connection driver does not seem to support proper parameters, it is embedding the parameter as text. I have no idea how good it's sanitizing and escaping method is, but it's not usually a good idea because of the risk of SQL injection.

I note that pymssql driver does support named parameters

What can happen is that when the compiler can see the exact value you want to use, it will calculate the statistics of how many rows are likely to match (the "cardinality"), and therefore it may choose a different access pattern based on what indexes are available.


When the value comes through as a proper parameter (when using a good SQL driver), the parameter is "sniffed", and the compiler will use the statistics based on the value in the first run of the query. This can be of benefit if there is a commonly-used value.


If the value is in a local variable, then parameter-sniffing is not used, and the compiler uses the average cardinality for that predicate. This may be better or worse than "sniffing".


It would sound like embedding the value is better. But this is not the case.

When the value is embedded, each query stand on it's own, and must be compiled again every time. This can have a big impact on execution time and CPU usage, as well as memory usage to save the query plans.

Charlieface
  • 52,284
  • 6
  • 19
  • 43
  • Thank you again for your answer! I assume that pyodbc supports **proper** parameters (even if not **named** parameters like the other pymssql you mentioned) and it corresponds to the case *when the proper parameter is sniffed using a good SQL driver*. This info in your answer - and the difference with a local variable where parameter-sniffing is **never** used instead - is very valuable and it is not even reported in https://use-the-index-luke.com/sql/where-clause/bind-parameters that you suggested me to see. Is there a more in-depth source I could read or is just a matter of DBA experience? –  Feb 09 '21 at 10:32
  • 1
    It seems from a little bit of research, that pyodbc does **not** support parameters. SQL Server does not have unnamed parameters, so `?` is just being embedded as text. Re parameter sniffing on variables: see https://sqlperformance.com/2013/08/t-sql-queries/parameter-sniffing-embedding-and-the-recompile-options also https://sqlperformance.com/2013/08/t-sql-queries/parameter-sniffing-embedding-and-the-recompile-options – Charlieface Feb 09 '21 at 10:40
  • 1
    @Giulio Sorry https://www.red-gate.com/simple-talk/sql/t-sql-programming/parameter-sniffing/ – Charlieface Feb 09 '21 at 10:55
  • Thank you very much again! So, if I continue with pyodbc and if I correctly understand the bottom line of your answer, DECLARE with local variables is maybe better (since they are proper variables, even though with sniffing disabled) than direct question marks in the where clause (because they would be text replacements, based on your research, so maybe that'd be worse in terms of recompiling the query plan every time and so on..) –  Feb 09 '21 at 11:08
  • 1
    Well, in this case you will still end up with recompilation of the `DECLARE` statement, but that is unlikely to have great cost to CPU, and I don't think such query plans are actually saved. Yes, that is probably the best option for pyodbc. But I **highly** recommend you look at getting a specific SQL Server driver, than using a generic ODBC driver. I wouldn't trust parameter embedding as far as I can throw it. – Charlieface Feb 09 '21 at 11:12
  • just to be clear, I'm using 'DRIVER={ODBC Driver 13 for SQL Server};' in pyodbc.connect. (I'm on W2012 R2 SE). Isn't it specific? –  Feb 09 '21 at 11:22
  • and btw, as a marginal, side note, I'm quite convinced that it supports proper bind parameters: see 1) https://github.com/mkleehammer/pyodbc/wiki/Getting-started#parameters 2) https://github.com/mkleehammer/pyodbc/wiki/Cursor#executesql-parameters 3) https://stackoverflow.com/a/32749863/11323942 or 4) https://github.com/mkleehammer/pyodbc/wiki/Binding-Parameters –  Feb 09 '21 at 13:01
  • 1
    Setup SQL Profiler and see what actually comes through. Also check `sys.dm_exec_query_plan` for the generated query plan. I'm not familiar with pyodbc, just telling you what I saw in documentation. It could be it generates `@p0`, `@p1` etc for each `?` – Charlieface Feb 09 '21 at 13:26