3

How can I have SQL repeat some set-based operation an arbitrary number of times without looping? How can I have SQL perform an operation against a range of numbers? I'm basically looking for a way to do a set-based for loop.

I know I can just create a small table with integers in it, say from 1 to 1000 and then use it for range operations that are within that range.

For example, if I had that table I could make a select to find the sum of numbers 100-200 like this:

select sum(n) from numbers where n between 100 and 200

Any ideas? I'm kinda looking for something that works for T-SQL but any platform would be okay.

[Edit] I have my own solution for this using SQL CLR which works great for MS SQL 2005 or 2008. See below.

Community
  • 1
  • 1
Hafthor
  • 16,358
  • 9
  • 56
  • 65
  • You have a lot of interesting answers here, but I'm still stuck on the same question... why do you want to do this? My guess is there is a fault in your goals or system design if you need this kind of functionality. (Not meaning to be harsh, I'm actually interested to see an example using sql where this is really needed). – Hogan Nov 30 '09 at 21:39

6 Answers6

5

I think the very short answer to your question is to use WITH clauses to generate your own.

Unfortunately, the big names in databases don't have built-in queryable number-range pseudo-tables. Or, more generally, easy pure-SQL data generation features. Personally, I think this is a huge failing, because if they did it would be possible to move a lot of code that is currently locked up in procedural scripts (T-SQL, PL/SQL, etc.) into pure-SQL, which has a number of benefits to performance and code complexity.

So anyway, it sounds like what you need in a general sense is the ability to generate data on the fly.

Oracle and T-SQL both support a WITH clause that can be used to do this. They work a little differently in the different DBMS's, and MS calls them "common table expressions", but they are very similar in form. Using these with recursion, you can generate a sequence of numbers or text values fairly easily. Here is what it might look like...

In Oracle SQL:

WITH
  digits AS  -- Limit recursion by just using it for digits.
    (SELECT
      LEVEL - 1 AS num
    FROM
      DUAL
    WHERE
      LEVEL < 10
    CONNECT BY
      num = (PRIOR num) + 1),
  numrange AS
    (SELECT
      ones.num
        + (tens.num * 10)
        + (hundreds.num * 100)
        AS num
    FROM
      digits ones
      CROSS JOIN
        digits tens
      CROSS JOIN
        digits hundreds
    WHERE
      hundreds.num in (1, 2)) -- Use the WHERE clause to restrict each digit as needed.
SELECT
  -- Some columns and operations
FROM
  numrange
  -- Join to other data if needed

This is admittedly quite verbose. Oracle's recursion functionality is limited. The syntax is clunky, it's not performant, and it is limited to 500 (I think) nested levels. This is why I chose to use recursion only for the first 10 digits, and then cross (cartesian) joins to combine them into actual numbers.

I haven't used SQL Server's Common Table Expressions myself, but since they allow self-reference, recursion is MUCH simpler than it is in Oracle. Whether performance is comparable, and what the nesting limits are, I don't know.

At any rate, recursion and the WITH clause are very useful tools in creating queries that require on-the-fly generated data sets. Then by querying this data set, doing operations on the values, you can get all sorts of different types of generated data. Aggregations, duplications, combinations, permutations, and so on. You can even use such generated data to aid in rolling up or drilling down into other data.

UPDATE: I just want to add that, once you start working with data in this way, it opens your mind to new ways of thinking about SQL. It's not just a scripting language. It's a fairly robust data-driven declarative language. Sometimes it's a pain to use because for years it has suffered a dearth of enhancements to aid in reducing the redundancy needed for complex operations. But nonetheless it is very powerful, and a fairly intuitive way to work with data sets as both the target and the driver of your algorithms.

Chris Ammerman
  • 14,978
  • 8
  • 41
  • 41
  • I agree that this is a huge failing. I found myself needing this type of thing many times and resorted to loops. – Hafthor Sep 22 '08 at 17:11
3

I created a SQL CLR table valued function that works great for this purpose.

SELECT n FROM dbo.Range(1, 11, 2) -- returns odd integers 1 to 11
SELECT n FROM dbo.RangeF(3.1, 3.5, 0.1) -- returns 3.1, 3.2, 3.3 and 3.4, but not 3.5 because of float inprecision. !fault(this)

Here's the code:

using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Collections;

[assembly: CLSCompliant(true)]
namespace Range {
    public static partial class UserDefinedFunctions {
        [Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true, FillRowMethodName = "FillRow", TableDefinition = "n bigint")]
        public static IEnumerable Range(SqlInt64 start, SqlInt64 end, SqlInt64 incr) {
            return new Ranger(start.Value, end.Value, incr.Value);
        }

        [Microsoft.SqlServer.Server.SqlFunction(DataAccess = DataAccessKind.None, IsDeterministic = true, SystemDataAccess = SystemDataAccessKind.None, IsPrecise = true, FillRowMethodName = "FillRowF", TableDefinition = "n float")]
        public static IEnumerable RangeF(SqlDouble start, SqlDouble end, SqlDouble incr) {
            return new RangerF(start.Value, end.Value, incr.Value);
        }

        public static void FillRow(object row, out SqlInt64 n) {
            n =  new SqlInt64((long)row);
        }

        public static void FillRowF(object row, out SqlDouble n) {
            n = new SqlDouble((double)row);
        }
    }

    internal class Ranger : IEnumerable {
        Int64 _start, _end, _incr;

        public Ranger(Int64 start, Int64 end, Int64 incr) {
            _start = start; _end = end; _incr = incr;
        }

        public IEnumerator GetEnumerator() {
            return new RangerEnum(_start, _end, _incr);
        }
    }

    internal class RangerF : IEnumerable {
        double _start, _end, _incr;

        public RangerF(double start, double end, double incr) {
            _start = start; _end = end; _incr = incr;
        }

        public IEnumerator GetEnumerator() {
            return new RangerFEnum(_start, _end, _incr);
        }
    }

    internal class RangerEnum : IEnumerator {
        Int64 _cur, _start, _end, _incr;
        bool hasFetched = false;

        public RangerEnum(Int64 start, Int64 end, Int64 incr) {
            _start = _cur = start; _end = end; _incr = incr;
            if ((_start < _end ^ _incr > 0) || _incr == 0)
                throw new ArgumentException("Will never reach end!");
        }

        public long Current {
            get { hasFetched = true; return _cur; }
        }

        object IEnumerator.Current {
            get { hasFetched = true; return _cur; }
        }

        public bool MoveNext() {
            if (hasFetched) _cur += _incr;
            return (_cur > _end ^ _incr > 0);
        }

        public void Reset() {
            _cur = _start; hasFetched = false;
        }
    }

    internal class RangerFEnum : IEnumerator {
        double _cur, _start, _end, _incr;
        bool hasFetched = false;

        public RangerFEnum(double start, double end, double incr) {
            _start = _cur = start; _end = end; _incr = incr;
            if ((_start < _end ^ _incr > 0) || _incr == 0)
                throw new ArgumentException("Will never reach end!");
        }

        public double Current {
            get { hasFetched = true; return _cur; }
        }

        object IEnumerator.Current {
            get { hasFetched = true; return _cur; }
        }

        public bool MoveNext() {
            if (hasFetched) _cur += _incr;
            return (_cur > _end ^ _incr > 0);
        }

        public void Reset() {
            _cur = _start; hasFetched = false;
        }
    }
}

and I deployed it like this:

create assembly Range from 'Range.dll' with permission_set=safe -- mod path to point to actual dll location on disk.
go
create function dbo.Range(@start bigint, @end bigint, @incr bigint)
  returns table(n bigint)
  as external name [Range].[Range.UserDefinedFunctions].[Range]
go
create function dbo.RangeF(@start float, @end float, @incr float)
  returns table(n float)
  as external name [Range].[Range.UserDefinedFunctions].[RangeF]
go
Hafthor
  • 16,358
  • 9
  • 56
  • 65
  • Awesome... one thing I would do different is in your logic to test if we're ever going to complete, you should really do this: if (Math.Sign(_end - _start) != Math.Sign(_incr)) throw new ArgumentException(""Will never reach end!"); – IDisposable Aug 15 '09 at 06:50
  • Also, you could probably use a generic for all the base numeric types (byte, int, etc....) – IDisposable Aug 15 '09 at 06:51
  • in C# 3,5 this is a one-liner: `for (var i = start.Value; i < end.Value; i += incr.Value) yield return i;` – jeroenh Feb 26 '11 at 13:00
1

This is basically one of those things that reveal SQL to be less than ideal. I'm thinking maybe the right way to do this is to build a function that creates the range. (Or a generator.)

I believe the correct answer to your question is basically, "you can't". (Sorry.)

Anders Eurenius
  • 4,170
  • 2
  • 24
  • 20
  • I get why this is modded down. It's correct, but not particularly helpful. (Unless you, like me, think fail-fast is a good idea here.) – Anders Eurenius Sep 19 '08 at 18:33
1

You can use a common table expression to do this in SQL2005+.

WITH CTE AS
(
    SELECT 100 AS n
    UNION ALL
    SELECT n + 1 AS n FROM CTE WHERE n + 1 <= 200
)
SELECT n FROM CTE
Mike Powell
  • 5,914
  • 4
  • 28
  • 28
  • Nice answer! Unfortunately it doesn't work for any number bigger than 100, or whatever CTE maximum recursion level you specified ("The statement terminated. The maximum recursion 100 has been exhausted before statement completion"). – Wiebe Tijsma Sep 26 '11 at 15:18
  • Thanks! You can increase the recursion level up to 32,767 on a per-query basis by using the MAXRECURSION query hint (http://msdn.microsoft.com/en-us/library/ms181714.aspx). Still, this isn't a totally general solution for any range. – Mike Powell Sep 26 '11 at 19:57
0

Here's a hack you should never use:

select sum(numberGenerator.rank)
from
(
select
    rank =  ( select count(*)  
              from reallyLargeTable t1 
              where t1.uniqueValue > t2.uniqueValue ), 
    t2.uniqueValue id1, 
    t2.uniqueValue id2
from reallyLargeTable t2 
) numberGenerator
where rank between 1 and 10

You can simplify this using the Rank() or Row_Number functions in SQL 2005

ilitirit
  • 16,016
  • 18
  • 72
  • 111
0

If using SQL Server 2000 or greater, you could use the table datatype to avoid creating a normal or temporary table. Then use the normal table operations on it.

With this solution you have essentially a table structure in memory that you can use almost like a real table, but much more performant.

I found a good discussion here: Temporary tables vs the table data type

Sergio Acosta
  • 11,418
  • 12
  • 62
  • 91