7

Assume the following schema and query:

Please look past the glaring design issues with having values in a varchar column that we expect to be ints.

create table dbo.Parent (
    Id bigint NOT NULL,
    TypeId int NOT NULL
)

create table dbo.Child (
    Id bigint NOT NULL,
    ParentId bigint NOT NULL,
    TypeId int NOT NULL,
    varcharColumn varchar(300) NULL
)

select cast(c.varcharColumn as int)
from dbo.Parent p (nolock)
    inner join dbo.Child c (nolock)
        on p.Id = c.ParentId
            and c.TypeId = 2
where p.TypeId = 13

The break:

We get a cast break due to a value that cannot be converted to an int. In this case: "123-1". The strange thing is that the value being cast gets filtered out of the final result set.

For example, this returns zero results

select c.varcharColumn
from dbo.Parent p (nolock)
    inner join dbo.Child c (nolock)
        on p.Id = c.ParentId
            and c.TypeId = 2
where p.TypeId = 13
    and c.varcharColumn = '123-1'

The query plan ends up looking at the Child table and actually applying the cast function before the where clause.

We were able to fix this by creating a new index on the child table (it was doing a PK scan)

create index [NCIDX_dbo_Child__TypeId] on dbo.Child (
    TypeId
)
include (
    ParentId,
    varcharColumn
)

It now filters on the parent table's where clause first.

Is there any way to fix this without the extra index? And again, please refrain from any suggestions related to fixing our schema. That is definitely the proper fix in this case.

I'm mostly interested in understanding why it applied the cast before it filtered the result set.

Thanks

Edit - Answer:

Many thanks to both Aaron and Gordon. If I ever get more than 15 rep points, I'll come back and up both of your replies.

We ended up needing Gordon's answer since we wanted to use this query in a view. A few folks at the office were wary of using a case statement because they prefer to have more control over ensuring that we have a smaller result set first (Aaron's answer), however it all boils down to looking at the query plan and checking your read counts.

Again, thanks for all of the responses!

Community
  • 1
  • 1
Chris Barry
  • 93
  • 1
  • 1
  • 6

4 Answers4

7

You can't easily control the way SQL Server processes your query. You can figure out some of the why by deep diving into the execution plan, but understanding that is the least of your problems in this specific case I think. You can do a little with join hints, perhaps, but that's hacky to me and the behavior is still not guaranteed (especially as you move to new versions etc). Two workarounds you could try are:

;WITH c AS 
(
  SELECT varcharColumn, ParentID, TypeId
   FROM dbo.Child AS c
   WHERE c.TypeId = 2
   AND ISNUMERIC(varcharColumn) = 1 --*
)
SELECT CONVERT(INT, c.varcharColumn)
FROM dbo.Parent AS p
INNER JOIN c
ON c.ParentId = p.Id
WHERE p.TypeId = 13;

But I have heard of cases where even separating this out into a CTE could lead to the bad plan that led the convert to occur first. So it may be that you need to break it out even further:

SELECT varcharColumn, ParentID, TypeId
INTO #c
   FROM dbo.Child AS c
   WHERE c.TypeId = 2
   AND ISNUMERIC(varcharColumn) = 1; --*

SELECT CONVERT(INT, c.varcharColumn)
  FROM dbo.Parent AS p
  INNER JOIN #c AS c
  ON c.ParentId = p.Id
  WHERE p.TypeId = 13;

(I also talk about the CASE expression solution in this answer.)

If you are on SQL Server 2012, you can simply do this - now it doesn't matter if the convert is attempted before the filter, and you don't have to rely on the wonky ISNUMERIC() function.*

SELECT TRY_CONVERT(INT, c.varcharColumn)
  FROM dbo.Parent AS p
  INNER JOIN dbo.Child AS c
  ON c.ParentId = p.Id
  WHERE c.TypeId = 2
  AND p.TypeId = 13;

* Please note that IsNumeric is not perfect. I wrote this article several years ago to help deal with this.

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • 1
    Hey Aaron, We didn't try a cte. I'm not sure if that would work, but it's a good idea. We did try a sub query (within the from clause) as invertedSpear recommends below, and it still busted. What would be the performance implications of IsNumeric? – Chris Barry Sep 06 '12 at 18:18
  • 1
    Performance implications are not what I'm worried about. It's that it will identify values that are numeric according to at least one numeric type, but can't be converted to an `INT` (e.g. `e` or `+`). So you may be better off doing something like `WHERE varcharColumn NOT LIKE '%[^0-9]%'` - don't quote me on that specific clause, that's from memory, but it should be something like that. – Aaron Bertrand Sep 06 '12 at 18:21
  • 1
    Also if the subquery didn't force it the CTE will probably not force it either. Can you post the actual execution plan somewhere? Is it possible the optimizer is making a bad choice due to a bad plan or outdated statistics? One thing you might try is updating the statistics on the table and/or evicting this specific plan from cache. – Aaron Bertrand Sep 06 '12 at 18:35
  • 1
    Unfortunately, I can't really give you the query plan as it requires a lot of data (and probably bad statistics) before it occurs and the only way to do that would be to give you our real table/column names. I checked, and we can't do that at the moment. We didn't try just updating the statistics, but that might have worked as well. – Chris Barry Sep 06 '12 at 21:08
7

First, this is not a "glaring design issue". SQL is a descriptive language of the output, not a procedural language that specifies how prcoessing is being done. There is no guarantee of the order of processing, in general, and this is an advantage. I might say there is a design issue, but it is around the general handling of exceptions in SQL statements.

According to SQL Server documentation (http://msdn.microsoft.com/en-us/library/ms181765.aspx), you can depend on the order of evauation for a CASE statement for scalar expressions. So, the following should work:

select (case when isnumeric(c.varcharColumn) = 1 then cast(c.varcharColumn as int) end)

Or, to get closer to an "int" expression:

select (case when isnumeric(c.varcharColumn) = 1 and c.varcharColumn not like '%.%' and c.varcharColumn not like '%e%'
             then cast(c.varcharColumn as int)
        end)

At least your code is doing an explicit CAST. This situation is much nastier when the casts are implicit (and there are hundreds of columns).

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 3
    Hi Gordon, I was referring to _our_ glaring design issue. Not an issue with MSSQL. As far as we came up with, the proper fix would be to have a separate ChildInt table. I'm curious, which solution (yours or Aaron's) would end up calling IsNumeric more times (ie. worse performing). It sounds like your response of using a case statement will ensure that it applies the IsNumeric check (and thus the cast) after the where clause. Thus running fewer times, and being cleaner. Did I understand correctly? – Chris Barry Sep 06 '12 at 18:25
  • 2
    @bizzarry . . . Yes, I agree that there is a design error. I would only worry about the call to isnumeric() in the rarest of circumstances -- say using these queries in a high volume transaction system. Intead, I would worry more about the (lack of) use of indexes and other considerations. – Gordon Linoff Sep 06 '12 at 18:28
  • 2
    @Gordon...Just a point of learning for me. Did I understand you correctly when you said that putting a case statement around the isnumeric call forces it to process the select statement last? – Chris Barry Sep 06 '12 at 18:37
  • 1
    @bizzarry . . . That is what the documentation says, for *scalar* quantities (not aggregations). It is also my experience in fixing this type of problem. – Gordon Linoff Sep 06 '12 at 19:19
  • 1
    For scalars a CASE expression is documented to "short circuit" - meaning if any of the conditions in the first check return false, the rest of that branch is discarded, so it should prevent your error in this case. As Gordon points out, this isn't always true for non-scalars. I pointed out a scenario with aggregates that seems to violate the documentation (or at least points out a missing disclaimer) here: http://dba.stackexchange.com/questions/12941/does-sql-server-read-all-of-a-coalesce-function-even-if-the-first-argument-is-no/12945#12945 – Aaron Bertrand Sep 06 '12 at 23:41
  • 1
    My point being, don't rely on CASE to generically short circuit this way in all cases. – Aaron Bertrand Sep 06 '12 at 23:41
  • As an aside, the `CASE` expression does not force the `SELECT` list to be processed last - it just means that, even if it is processed before the filter, it won't error out because the `CONVERT` is only attempted against values that match `ISNUMERIC() = 1`. So order doesn't really matter any more (and you can't control or predict it - today the `SELECT` list could be processed first, tomorrow you could get a different plan where the filter happens first). – Aaron Bertrand Oct 07 '14 at 13:12
0

Technical perspective to this as per me

 create table dbo.Parent (
    Id bigint NOT NULL,
    TypeId int NOT NULL
)
    create table dbo.Child (
        Id bigint NOT NULL,
        ParentId bigint NOT NULL,
        TypeId int NOT NULL,
        varcharColumn varchar(300) NULL
    )

    select cast(c.varcharColumn as int)
    from dbo.Parent p (nolock)
        inner join dbo.Child c (nolock)
            on p.Id = c.ParentId
                and c.TypeId = 2
    where p.TypeId = 13

while selection you are doing inner join that leads to table lookup and filters out the data Now according to me, SQL server is applying " so-called optimization" and making cast operation along with that. and obviously it is applying this cast before filtering that leads to conversion issue.

one more issue whose curiosity leads me here,

SELECT    MAX( Cast(dealer_number AS INT) + 1)
                    FROM   dealer_number dn
                    INNER JOIN dealer d
                    ON dn.dealer_seq = d.dealer_seq 
                    INNER JOIN dealer_type(nolock) dt
                    ON dt.dealer_number_seq = dn.dealer_number_seq
                    INNER JOIN program_dealer_type(nolock) pdt
                     ON pdt.program_dealer_type_seq = dt.program_dealer_type_seq
                    WHERE 
                    Isnumeric(dealer_number) = 1 AND
                    pdt.dealer_type = 'Dealer'
                    AND d.program_seq = 57

got following issue

The conversion of the varchar value '10054239051' overflowed an int column.

I have added Isnumeric for the 100% proof that I didn't get any alphanumeric number.

but as Isnumeric is being while table look for performance it leads to respective error

now just remove Isnumeric and it works fine

SELECT    MAX( Cast(dealer_number AS INT) + 1)
                    FROM   dealer_number dn
                    INNER JOIN dealer d
                    ON dn.dealer_seq = d.dealer_seq 
                    INNER JOIN dealer_type(nolock) dt
                    ON dt.dealer_number_seq = dn.dealer_number_seq
                    INNER JOIN program_dealer_type(nolock) pdt
                    ON pdt.program_dealer_type_seq = dt.program_dealer_type_seq
                    WHERE 
                    pdt.dealer_type = 'Dealer'
                    AND d.program_seq = 57

now I just removed that pre-processing filter and it just solves the problem but to 100% proof that I didn't get any alphanumeric number I had to implement case in the select

SELECT    MAX(case WHEN 
                    ISNUMERIC(DEALER_NUMBER )=1 
                        THEN  Cast(dealer_number AS INT) 
                        ELSE 0 END 
                    + 1) 
                    FROM   dealer_number dn
                    INNER JOIN dealer d
                    ON dn.dealer_seq = d.dealer_seq 
                    INNER JOIN dealer_type(nolock) dt
                    ON dt.dealer_number_seq = dn.dealer_number_seq
                    INNER JOIN program_dealer_type(nolock) pdt
                    ON pdt.program_dealer_type_seq = dt.program_dealer_type_seq
                    WHERE 
                    pdt.dealer_type = 'Dealer'
                    AND d.program_seq = 57

Above will solve both this issue. Anyone had a point on this most welcome. :)

rakeshyadvanshi
  • 294
  • 3
  • 9
-1

You can move the filter to a subquery, which would filter out the bad vals first, then cast everything. This doesn't answer why, but it does get you what you want in what is, in my opinion, the simplest way.

select cast(varcharColumn as int)
FROM(
select c.varcharColumn

from dbo.Parent p (nolock)
    inner join dbo.Child c (nolock)
        on p.Id = c.ParentId
            and c.TypeId = 2
where p.TypeId = 13
) table1
invertedSpear
  • 10,864
  • 5
  • 39
  • 77
  • 2
    Hi invertedSpear, thank you for your reply. I should have mentioned, we actually tried this. It still sticks with the same query plan and breaks. – Chris Barry Sep 06 '12 at 18:15