10

Quick version: Which is the best of the following and why? (or is there a better way):

SELECT FLOOR(LOG10(Number))+1 AS NumLength FROM Table
SELECT LEN(CONVERT(VARCHAR, Number)) AS NumLength FROM Table
SELECT LEN(CAST(Number AS VARCHAR(10))) AS NumLength FROM Table

A bit more detail:
I wish to identify the most efficient mechanism for calculating the length of the string representation of an integer (more specifically a natural number - always >0).

I'm using MS SQL Server (2005).

I've come up with the 3 solutions above, all of which seem to work fine.

I know the third version may have issues with very large integers, but for now we may assume that "Number" is never more than 9 decimal digits long.

Yet more detail: (you don't have to read this bit to answer my question)
This query is used heavily in a transaction processing environment.
Up to now, I have got away with the assumption that "Number" is always exactly 6 digits long.
However, now I must update the code to support anywhere from 4 to 9 digits.

This SQL is part of a condition to identify the card scheme of a card.

The full query attempts to find the records matching the start of the card number against the Start and End range.

So full SQL condition would be something like this:

WHERE 
-- Start and End match
((Start=End OR End=0) AND (Start=CAST(LEFT('<card number>', FLOOR(LOG10(Start))+1) AS BIGINT))) OR 

-- Start != End
-- >= Start
(Start<=CAST(LEFT('<card number>', FLOOR(LOG10(Start))+1) AS BIGINT) AND 
-- <= End
End>=CAST(LEFT('<card number>', FLOOR(LOG10(Start))+1) AS BIGINT))

NOTE:
I can redesign the table to use VARCHARs instead of INTs. This would allow me to use "LEN(Start)" instead of "FLOOR(LOG10(Start))+1)" however the condition will then have much more CASTs.
I'd prefer to continue to deal in INTs as the DB schema will stay the same, and in any case dealing with INTs should be faster than VARCHARs.

IF I change the fields to VARCHARs, my condition might be:

WHERE 
-- Start and End match
((Start=End OR LEN(End)=0) AND (Start=LEFT('<card number>', LEN(Start)))) OR 

-- Start != End
-- >= Start
(CAST(Start AS BIGINT)<=CAST(LEFT('<card number>', LEN(Start)) AS BIGINT) AND 
-- <= End
CAST(End AS BIGINT)>=CAST(LEFT('<card number>', LEN(Start)) AS BIGINT))

Many thanks for any help,
Dave

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
user390935
  • 255
  • 1
  • 4
  • 9
  • I've just realised that the "FLOOR(LOG10(Number))+1" solution fails if Number is an integer power of 10. This is OK in my own situation - as the software does not handle any card numbers starting with "1000". However, I realise that this makes the solution not complete for general applications. – user390935 Mar 21 '11 at 13:09
  • 1
    If you only want the length and not the string itself it might be faster to use an case and check on the value - you spare the string-conversion (don't know the specific sql-server syntax, so this is pseudocode): if < 10 then 1 else if < 100 then 2 else if < 1000 then 3 ... else 10 end – MacGucky Mar 21 '11 at 13:10
  • Why not [create a big numbers table](http://stackoverflow.com/questions/10819/sql-auxiliary-table-of-numbers/2663232#2663232) and test this out for yourself looking at the output of `SET STATISTICS TIME ON` – Martin Smith Mar 21 '11 at 13:11
  • I read the "more detail"-part: if you don't mind one extra column and need maximum processing performance you can create an extra column for the length. So you don't need to calculate it on processing of the data. – MacGucky Mar 21 '11 at 13:14
  • @MacGucky - You probably wouldn't want the `CASE` expressions that way round though. – Martin Smith Mar 21 '11 at 15:35
  • @Adam - BTW I found the mathematical version to be potentially the slowest. – Martin Smith Mar 21 '11 at 17:11
  • @user390935 Why not FLOOR(LOG10(Number-1))+1? I said without testing yet. – Gary Nov 06 '18 at 15:28

2 Answers2

2

On my machine versions 2 and 3 come out about equal and beat the other two.

Edit: Though it has just occurred to me that my original test was a bit unfair on CASE as ordering the statements in ascending numerical order means that only 10 possible numbers would meet the first condition and exit early. I've added an additional test below. You might also try nesting CASE statements to do a binary search.

SET NOCOUNT ON
SET STATISTICS TIME ON

  PRINT 'Test 1';

   WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
 SELECT MAX(FLOOR(LOG10(N))+1)
   FROM cteTally
  WHERE N <= 10000000;

  PRINT 'Test 2';

     WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
 SELECT MAX(LEN(CONVERT(VARCHAR, N)))
   FROM cteTally
  WHERE N <= 10000000;


  PRINT 'Test 3';

     WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
 SELECT MAX(LEN(CAST(N AS VARCHAR(10))))
   FROM cteTally
  WHERE N <= 10000000;

  PRINT 'Test 4';

     WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
SELECT MAX(CASE
             WHEN N < 10 THEN 1
             WHEN N < 100 THEN 2
             WHEN N < 1000 THEN 3
             WHEN N < 10000 THEN 4
             WHEN N < 100000 THEN 5
             WHEN N < 1000000 THEN 6
             WHEN N < 10000000 THEN 7
             WHEN N < 100000000 THEN 8
           END)
FROM   cteTally
WHERE  N <= 10000000;   

  PRINT 'Test 5';

     WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
 SELECT MAX(CASE 
             WHEN N >= 100000000 THEN NULL
             WHEN N >= 10000000 THEN 8
             WHEN N >= 1000000  THEN 7
             WHEN N >= 100000   THEN 6
             WHEN N >= 10000    THEN 5
             WHEN N >= 1000     THEN 4
             WHEN N >= 100      THEN 3
             WHEN N >= 10       THEN 2   
             ELSE                    1
            END   )
   FROM cteTally
  WHERE N <= 10000000;

Results from an example run on my machine are

Test 1
   CPU time = 9422 ms,  elapsed time = 9523 ms.

Test 2
   CPU time = 7021 ms,  elapsed time = 7130 ms.

Test 3
   CPU time = 6864 ms,  elapsed time = 7006 ms.

Test 4
   CPU time = 9328 ms,  elapsed time = 9456 ms.

Test 5
   CPU time = 6989 ms,  elapsed time = 7358 ms.    
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • And CAST seems to be always a tiny bit faster than CONVERT. – Andriy M Mar 21 '11 at 13:37
  • @Andriy - I found the opposite. For me the 2nd one beats the 3rd consistently but by such a small amount I'm not sure if its statistically significant. (Edit: Just rerun it and 3rd beat 2nd that time!) – Martin Smith Mar 21 '11 at 13:40
  • @Andriy M - last I checked, there was a single internal operator called `CONVERT` that will be used by both `CAST` and `CONVERT` operations - try using showplan_xml, and then opening as XML (rather than the diagram tool) – Damien_The_Unbeliever Mar 21 '11 at 13:42
  • @Damien - Correct. The only difference between them in the plans is the length of the `varchar`. `[Expr1128] = Scalar Operator(MAX(len(CONVERT(varchar(30),[Expr1127],0))))` and `[Expr1128] = Scalar Operator(MAX(len(CONVERT(varchar(10),[Expr1127],0))))` – Martin Smith Mar 21 '11 at 13:45
  • 1
    @Martin: Might be the different altitude or climate or something like that. :) Anyway, it was insignificant in my case too. – Andriy M Mar 21 '11 at 13:46
  • Thanks - that's great. I think I need to learn some more SQL. [Still trying to decipher what you've written above]. Next I just need to determine if it's better to convert an int to string or string to int for the = comparison. – user390935 Mar 21 '11 at 14:52
  • @user390935 - It uses cross joined CTEs and `row_number` to generate all numbers from 1 to 10000000 then calculates the length. The `max` is just to stop it returning loads of rows to the client that could add to the elapsed time significantly. – Martin Smith Mar 21 '11 at 15:04
1

To answer your question, the second version is clearer about what you actually want. Think about what someone looking at this code in six months will think: will they realize that the first version is trying obtain the length of a number represented in decimal, or will they think that you're performing some obscure mathematical operation that they can't find documentation requiring?

More generally, though, you should probably consider storing these values as character data anyway, since they aren't representing real "numbers" to you (you aren't comparing based upon relative value, you aren't performing arithmetic, etc.). You can use CHECK constraints to ensure that only numeric digits are in the field.

I'm not clear on why storing them as character data would require conversions in your queries, assuming that you're consistent. There's also no reason to assume that dealing with ints would be faster than varchar, especially if there's conversion involved in both cases.

Adam Robinson
  • 182,639
  • 35
  • 285
  • 343