7

I have 98w rows data. When I want sort my data with pub_time, I found an interest thing.

Here is the SQL:

select * 
from t_p_blog_article_info t  
order by t.pub_time desc

It cost 19s.

select * 
from t_p_blog_article_info t 
where t.pub_time > to_date( '1900-01-01 01:00:00', 'yyyy-mm-dd   hh24:mi:ss ')  
order by t.pub_time desc

It cost 0.2s.

I want to know, why?

Nick Vaccaro
  • 5,428
  • 6
  • 38
  • 60
sarowlwp
  • 105
  • 1
  • 11

2 Answers2

4

You probably have an index on pub_time on your table.

Therefore, the second query can make use of this index to return only those records with non-null dates after the specified date, whereas the first query has to query the whole table.

  • Yes i have index on pub_time , but why the first query not use the index? – sarowlwp Mar 13 '12 at 13:17
  • Though, properly speaking, both queries end up having to query the whole table, since they both have `SELECT *` and *presumably* both return all rows. (At least, I doubt the OP would be asking this question if the second query were returning many fewer rows.) – ruakh Mar 13 '12 at 13:18
  • @sarowlwp: Indices don't include null values, so if `pub_time` is nullable (even if it's never actually null), the index on it won't be sufficient for a query whose WHERE-clause doesn't exclude records where it's null. – ruakh Mar 13 '12 at 13:19
  • 1
    Because you may have null values in your date column - normal indexes in Oracle do not record nulls, so the index alone cannot be used to sort columns which may include nulls. –  Mar 13 '12 at 13:20
  • But is the column defined to be not nullable? –  Mar 13 '12 at 13:27
  • no, you mean if the column not defined to be not nullalbe , in the order by query ,it will not use the index ? – sarowlwp Mar 13 '12 at 14:24
  • Not unless you include a condition restricting the query to non-null values, since it could include null values - so without either the column constraint on the table, or the condition in the query, it has to check the entire table (since some of the desired results *could* have null values). –  Mar 13 '12 at 14:31
0

There are a range of possibilities. You could be filtering out large numbers of rows with invalid/null dates in pub_time, but I doubt that you'd fail to notice/mention a significant number of these.

The three things that stick out in my mind are:

1 - You have a index or composite index involving pub_time, and the restriction in your where clause is triggering the use of a different access path

2 - You had no statistics available for the optimizer when you ran your first query. When running the second query a better access path was selected thanks to some information caching that happened when you ran the first query. This can be verified by running the first query a few more times and seeing if there's a significant performance improvement.

3 - Similar to the first point, the optimizer could just be selecting a better access path based solely on the implications of the where clause. Perhaps giving the hint that null/invalid values will not have to be handled is enough - your system could be avoiding one or more full table scans to weed out invalid/null pub_times.

Pinpointing the reasons for things like this is quickly becoming an empirical venture - it's hard for me to say more without knowing your platform & version. From the tag I take it you're using oracle, in which case you should be able to use some form of "explain query" or "explain plan" tool to get a better sense of what's going on. For more information on the oracle optimizer see http://docs.oracle.com/cd/B10500_01/server.920/a96533/optimops.htm (This is for Oracle 9i v9.2, but it has a decent explanation of the version-independent concepts)

JPistone
  • 35
  • 9