12

The title doesn't quite capture what I mean, and this may be a duplicate.

Here's the long version: given a guest's name, their registration date, and their checkout date, how do I generate one row for each day that they were a guest?

Ex: Bob checks in 7/14 and leaves 7/17. I want

('Bob', 7/14), ('Bob', 7/15), ('Bob', 7/16), ('Bob', 7/17) 

as my result.

Thanks!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Daniel Cotter
  • 1,342
  • 2
  • 15
  • 27
  • Have a look here http://stackoverflow.com/questions/1478951/tsql-generate-a-resultset-of-incrementing-dates [1]: http://stackoverflow.com/questions/1478951/tsql-generate-a-resultset-of-incrementing-dates – StoicFnord Jun 21 '12 at 15:28
  • 3
    Generally, you don't. You have a look-up table and pick them out of there. `WHERE calendar.date >= user.start_date AND calendar.date <= user.leave_date` You *CAN* generate sets using loops, or recursive queries, but they are never as fast as using a look-up table. – MatBailie Jun 21 '12 at 15:28
  • I asked a very similar question, but mine was hours, not days. You could change to fit your need pretty easily. http://stackoverflow.com/questions/10986344/get-every-hour-for-a-time-range – Limey Jun 21 '12 at 15:56
  • Please specify the version of SQL Server. I posted a solution that depends on SQL Server 2008; it may differ if you are using SQL Server 2005. – Aaron Bertrand Jun 21 '12 at 15:58
  • @Dems That's a good answer. I was hoping to find that there was a language construct in SQL that made it easy to generate a range. – Daniel Cotter Jun 21 '12 at 16:46
  • I meant no offense. Really. http://meta.stackexchange.com/q/122986/133242 – Matt Ball Jun 21 '12 at 17:05
  • @MattBall If I had had slightly-but-not-quite-working code, I would've included it. As it was, however, I was drawing a complete blank for about fifteen minutes before I decided to ask the question. SQL's set-based paradigm is more alien to my way of thinking about things than any procedural language or functional language, and sometimes I don't even know where to start. Hence, the lack of even-slightly-working code. – Daniel Cotter Jun 21 '12 at 17:13
  • possible duplicate of [SQL Server 2008 Generate a Series of date times](http://stackoverflow.com/questions/7149271/sql-server-2008-generate-a-series-of-date-times) – 200_success Jul 24 '15 at 03:44

5 Answers5

34

I would argue that for this specific purpose the below query is about as efficient as using a dedicated lookup table.

DECLARE @start DATE, @end DATE;
SELECT @start = '20110714', @end = '20110717';

;WITH n AS 
(
  SELECT TOP (DATEDIFF(DAY, @start, @end) + 1) 
    n = ROW_NUMBER() OVER (ORDER BY [object_id])
  FROM sys.all_objects
)
SELECT 'Bob', DATEADD(DAY, n-1, @start)
FROM n;

Results:

Bob     2011-07-14
Bob     2011-07-15
Bob     2011-07-16
Bob     2011-07-17

Presumably you'll need this as a set, not for a single member, so here is a way to adapt this technique:

DECLARE @t TABLE
(
    Member NVARCHAR(32), 
    RegistrationDate DATE, 
    CheckoutDate DATE
);

INSERT @t SELECT N'Bob', '20110714', '20110717'
UNION ALL SELECT N'Sam', '20110712', '20110715'
UNION ALL SELECT N'Jim', '20110716', '20110719';

;WITH [range](d,s) AS 
(
  SELECT DATEDIFF(DAY, MIN(RegistrationDate), MAX(CheckoutDate))+1,
    MIN(RegistrationDate)
    FROM @t -- WHERE ?
),
n(d) AS
(
  SELECT DATEADD(DAY, n-1, (SELECT MIN(s) FROM [range]))
  FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
  FROM sys.all_objects) AS s(n)
  WHERE n <= (SELECT MAX(d) FROM [range])
)
SELECT t.Member, n.d
FROM n CROSS JOIN @t AS t
WHERE n.d BETWEEN t.RegistrationDate AND t.CheckoutDate;
----------^^^^^^^ not many cases where I'd advocate between!

Results:

Member    d
--------  ----------
Bob       2011-07-14
Bob       2011-07-15
Bob       2011-07-16
Bob       2011-07-17
Sam       2011-07-12
Sam       2011-07-13
Sam       2011-07-14
Sam       2011-07-15
Jim       2011-07-16
Jim       2011-07-17
Jim       2011-07-18
Jim       2011-07-19

As @Dems pointed out, this could be simplified to:

;WITH natural AS 
(
  SELECT ROW_NUMBER() OVER (ORDER BY [object_id]) - 1 AS val 
  FROM sys.all_objects
) 
SELECT t.Member, d = DATEADD(DAY, natural.val, t.RegistrationDate) 
  FROM @t AS t INNER JOIN natural 
  ON natural.val <= DATEDIFF(DAY, t.RegistrationDate, t.CheckoutDate);
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • AFAIK SQL Server's optimiser means that you don't really need the `WHERE n < = (SELECT MAX())` which means that this can be even further simplified... `WITH natural AS (SELECT ROW_NUMBER() OVER (ORDER BY id) - 1 AS val FROM sys.objects) SELECT t.Member, DATEADD(DAY, natural.val, t.start) FROM @t AS t INNER JOIN natural ON natural.val <= DATEDIFF(DAY, t.start, t.end)` *[But, even then, a straight look-up table is still going to use less CPU cycles at the very least.]* – MatBailie Jun 21 '12 at 16:33
  • @Dems when I started writing my goal was to use the highest range in `TOP` against `sys.all_objects`. You're right that it could be simplified. – Aaron Bertrand Jun 21 '12 at 16:37
  • Thanks, your query does what exactly what I was looking for. One question -- is it necessary to use MAX and MIN on the 'range' table? In this example, I'm only seeing one row generated for 'range', so there is only one candidate for either max or min (in which case I would just put the range and start date in regular variables). I'm quite impressed with your SQL chops and curious if there's some subtlety there I'm missing. – Daniel Cotter Jun 21 '12 at 17:56
  • That is meant for the case when you're dealing with more than one user, and there may be overlapping dates. If you're only dealing with one user's single visit then you shouldn't need to use that version of the query at all. – Aaron Bertrand Jun 21 '12 at 17:58
  • Hmm. Our production box, with thousands of guests and overlapping dates is still only returning one row for [range]. – Daniel Cotter Jun 21 '12 at 20:19
  • If you want help troubleshooting query results, you need to show sample data, the exact query you're running, and the results you expect. I can't troubleshoot your production box if I can't see it, and I can't troubleshoot the results if I don't know how you got them. – Aaron Bertrand Jun 21 '12 at 20:22
  • Sorry, I should have been more clear about my intentions. I didn't mean for you to think I was asking you to troubleshoot my query for me. The code you provided is working just fine. What you did is just a way of using the language that I've never seen before, and I was trying to figure out what you were doing for my own understanding. If you're busy, feel free not to follow up on this. I appreciate all the help you've given me already. – Daniel Cotter Jun 21 '12 at 20:28
  • The range is only supposed to return one row. It is just two values: the number of days between the smallest and largest date in the range, and the starting date in the range. These values are used in the next query. – Aaron Bertrand Jun 21 '12 at 20:31
  • Well, what I'm saying is that for readability, I would prefer something like a DECLARE and SET of a variable -- the CTE expression isn't very readable to me. On the dataset that you provided (Jim, Bob, and Sam), I was only getting one row in the [range] table. On the much larger dataset I'm working with at work, I'm still only getting one row, in which case a DECLARE and SET is feasible. Under what conditions would I actually *need* to use a CTE for this purpose? When wouldn't it be advisable to use a regular variable? – Daniel Cotter Jun 21 '12 at 20:34
  • If, say, you wanted to put this in a function. An inline table-valued function could use a CTE and is much easier to optimize than one that declares local variables (which make the function a multi-statement function instead of inline). I prefer having a single query than a bunch of independent variables especially if I have to run other queries to populate them. – Aaron Bertrand Jun 21 '12 at 20:37
  • Please don't post any code in a comment. It is impossible to read. And readability is subjective. If you'd rather use a variable, that's ok, use a variable. – Aaron Bertrand Jun 21 '12 at 20:38
  • Okay, that explanation makes sense. At first I thought you were saying that using a variable would actually get incorrect results in some cases, but it sounds more like an issue of performance (and perhaps style). Well, thanks for all the help. I certainly do appreciate it. – Daniel Cotter Jun 21 '12 at 20:48
  • No that comment had nothing to do with variables vs. CTEs. Just that if you're only dealing with a single set of contiguous dates from a single row, you don't need to determine all the potential overlaps. – Aaron Bertrand Jun 21 '12 at 20:50
  • Okay, but my point was that I was using the code you posted with both a small data set (yours) and a large data set (mine) with many different guests and many overlapping dates, and *still* was only getting one row in the [range] table, in which case an aggregate function and a CTE seem unnecessary (except for the performance reasons you mentioned). I can't post my data here for confidentiality reasons, or I would be happy to share, but I feel like I'm missing something (a decade or so of SQL experience, I'm guessing). – Daniel Cotter Jun 21 '12 at 20:56
  • Like I said. You don't have to use a CTE in *either* case. That's just the style I prefer. If you want to use variables, go ahead. The CTE is ***NEVER*** going to produce more than one row, no matter how big your data set is. – Aaron Bertrand Jun 21 '12 at 20:57
  • I'm still a little confused, sadly (chalk it up to being a SQL amateur), but seriously, *thanks* for all the help. Your Deep Dive books look awesome, by the way, probably gonna pick one up. – Daniel Cotter Jun 21 '12 at 21:04
  • One important note here is that a finite number of rows will be returned from sys.all_objects. If you have a relatively small database but require a relatively large date range, this solution could be augmented with a cross join to a multiplier table. i.e. `select row_number() over (order by a.object_id) as Counter from sys.all_objects a, (select 1 as multiplier union all select 2 as multiplier union all select 3 as multiplier) m` – Corbin Jan 06 '20 at 18:28
8

I usually do this with a trick using row_number() on some table. So:

select t.name, dateadd(d, seq.seqnum, t.start_date)
from t left outer join
     (select row_number() over (order by (select NULL)) as seqnum
      from t
     ) seq
     on seqnum <= datediff(d, t.start_date, t.end_date)

The calculation for seq goes pretty fast, since no calculation or ordering is required. However, you need to be sure the table is big enough for all time spans.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
2

If you have a "Tally" or "Numbers" table, life get's real simple for things like this.

 SELECT Member, DatePresent = DATEADD(dd,t.N,RegistrationDate)
   FROM @t 
  CROSS JOIN dbo.Tally t
  WHERE t.N BETWEEN 0 AND DATEDIFF(dd,RegistrationDate,CheckoutDate)
;

Here's how to build a "Tally" table.

--===================================================================
--      Create a Tally table from 0 to 11000
--===================================================================
--===== Create and populate the Tally table on the fly.
 SELECT TOP 11001
        IDENTITY(INT,0,1) AS N
   INTO dbo.Tally
   FROM Master.sys.ALL_Columns ac1
  CROSS JOIN Master.sys.ALL_Columns ac2
;
--===== Add a CLUSTERED Primary Key to maximize performance
  ALTER TABLE dbo.Tally
    ADD CONSTRAINT PK_Tally_N 
        PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100
;
--===== Allow the general public to use it
  GRANT SELECT ON dbo.Tally TO PUBLIC
;
GO

For more information on what a "Tally" table is in SQL and how it can be used to replace While loops and the "Hidden RBAR" of reursive CTEs that count, please see the following article.

http://www.sqlservercentral.com/articles/T-SQL/62867/

Jeff Moden
  • 3,271
  • 2
  • 27
  • 23
  • I really like the concept. It would be great if there were a built-in virtual table like this to join on (where no memory or disk IO would be wasted). If you ever suggest a virtual Tally table feature on the [SQL Server feedback forms](https://feedback.azure.com/forums/908035-sql-server) then send me the link and I'll vote for it! – Louis Somers Apr 09 '19 at 10:29
  • Aparently someone already did: [https://feedback.azure.com/forums/908035-sql-server/suggestions/32890519-add-a-built-in-table-of-numbers](https://feedback.azure.com/forums/908035-sql-server/suggestions/32890519-add-a-built-in-table-of-numbers) – Louis Somers Apr 09 '19 at 11:53
0

This may work for ya:

with mycte as
 (
     select cast('2000-01-01' as datetime) DateValue, 'Bob' as Name
     union all
     select DateValue + 1 ,'Bob' as Name
     from    mycte   
     where   DateValue + 1 < '2000-12-31'
 )
 select *
from    mycte
OPTION (MAXRECURSION 0)
smurtagh
  • 529
  • 6
  • 17
  • 2
    That contains a "Counting Recursive CTE". See the following article for why they're so very bad even when counting small numbers. http://www.sqlservercentral.com/articles/T-SQL/74118/ – Jeff Moden Jun 25 '12 at 04:56
-6

I would create a trigger to create extra records and run it upon checkout. Alternatively, you can have a daily midnight job doing the same (if you need up-to-date info in your database).

Andy
  • 1,618
  • 11
  • 13
  • 1
    This isn't really an answer - how does the trigger "create extra records"? – Aaron Bertrand Jun 21 '12 at 15:36
  • @AaronBertrand this is trivial programming task in any language. – Andy Jun 21 '12 at 15:40
  • 3
    If it were so trivial, the OP wouldn't be asking, right? And it shouldn't be hard to actually back up your answer with some code for *this* language? – Aaron Bertrand Jun 21 '12 at 15:42
  • @AaronBertrand I thought it is not about implementation but about approach. – Andy Jun 21 '12 at 16:35
  • 1
    I guess we interpret "how do I generate one row for each day that they were a guest" differently. To me that sounds like a question about specific syntax, not "go write a query." – Aaron Bertrand Jun 21 '12 at 16:38
  • 2
    @Andy... You wrote "@AaronBertrand this is trivial programming task in any language". Let's see what you've got. Post the trigger code. – Jeff Moden Jun 25 '12 at 05:00