2

I have a set of transactions occurring at specific points in time:

CREATE TABLE Transactions (
    TransactionDate Date NOT NULL,
    TransactionValue Integer NOT NULL
)

The data might be:

INSERT INTO Transactions (TransactionDate, TransactionValue)
VALUES ('1/1/2009', 1)
INSERT INTO Transactions (TransactionDate, TransactionValue)
VALUES ('3/1/2009', 2)
INSERT INTO Transactions (TransactionDate, TransactionValue)
VALUES ('6/1/2009', 3)

Assuming that the TransactionValue sets some kind of level, I need to know what the level was between the transactions. I need this in the context of a set of T-SQL queries, so it would be best if I could get a result set like this:

Month   Value
1/2009  1
2/2009  1
3/2009  2
4/2009  2
5/2009  2
6/2009  3

Note how, for each month, we either get the value specified in the transaction, or we get the most recent non-null value.

My problem is that I have little idea how to do this! I'm only an "intermediate" level SQL Developer, and I don't remember ever seeing anything like this before. Naturally, I could create the data I want in a program, or using cursors, but I'd like to know if there's a better, set-oriented way to do this.

I'm using SQL Server 2008, so if any of the new features will help, I'd like to hear about it.

P.S. If anyone can think of a better way to state this question, or even a better subject line, I'd greatly appreciate it. It took me quite a while to decide that "spread", while lame, was the best I could come up with. "Smear" sounded worse.

pilcrow
  • 56,591
  • 13
  • 94
  • 135
John Saunders
  • 160,644
  • 26
  • 247
  • 397
  • I don't quite understand what you're trying to ask for. Are you wanting to compare the level of one month compared to the other? Or what all the levels were in a certain date span? – DForck42 Apr 30 '09 at 18:38
  • What are the effective levels for any given month. For instance, 2/2009 value is 1 because 1/2009 was 1, and nothing was specified for 2. April 2009 would have value 2 because of no April data. – John Saunders Apr 30 '09 at 18:40
  • so basically if a month doesn't have a level you want to insert a value for that month equal to what the previous month's level? – DForck42 Apr 30 '09 at 18:51
  • Yes. The previous month that doesn't have a level. I'll update my question with data making that more clear. – John Saunders Apr 30 '09 at 18:52
  • ok that makes more sense – DForck42 Apr 30 '09 at 18:53
  • "Determine values for missing months based on data of previous months in MS SQL 2008" sounds like a good title to me – DForck42 Apr 30 '09 at 19:19
  • My first question on SO, and it gets downvoted. Could someone suggest the reason for that, and what I could have improved in the question? So I can try harder next time? – John Saunders May 15 '09 at 11:21
  • Again? Cowards in action, or what? Downvoting is meant to help people ask better questions. If you don't say what the problem is, then you're not helping. – John Saunders Jun 16 '09 at 14:29

7 Answers7

4

I'd start by building a Numbers table holding sequential integers from 1 to a million or so. They come in really handy once you get the hang of it.

For example, here is how to get the 1st of every month in 2008:

select firstOfMonth = dateadd( month, n - 1, '1/1/2008')
from Numbers
where n <= 12;

Now, you can put that together using OUTER APPLY to find the most recent transaction for each date like so:

with Dates as (
    select firstOfMonth = dateadd( month, n - 1, '1/1/2008')
    from Numbers
    where n <= 12
)
select d.firstOfMonth, t.TransactionValue
from Dates d
outer apply (
    select top 1 TransactionValue
    from Transactions
    where TransactionDate <= d.firstOfMonth
    order by TransactionDate desc
) t;

This should give you what you're looking for, but you might have to Google around a little to find the best way to create the Numbers table.

John Gibb
  • 10,603
  • 2
  • 37
  • 48
1

If you do this type of analysis often, you might be interested in this SQL Server function I put together for exactly this purpose:

if exists (select * from dbo.sysobjects where name = 'fn_daterange') drop function fn_daterange;
go

create function fn_daterange
   (
   @MinDate as datetime,
   @MaxDate as datetime,
   @intval  as datetime
   )
returns table
--**************************************************************************
-- Procedure: fn_daterange()
--    Author: Ron Savage
--      Date: 12/16/2008
--
-- Description:
-- This function takes a starting and ending date and an interval, then
-- returns a table of all the dates in that range at the specified interval.
--
-- Change History:
-- Date        Init. Description
-- 12/16/2008  RS    Created.
-- **************************************************************************
as
return
   WITH times (startdate, enddate, intervl) AS
      (
      SELECT @MinDate as startdate, @MinDate + @intval - .0000001 as enddate, @intval as intervl
         UNION ALL
      SELECT startdate + intervl as startdate, enddate + intervl as enddate, intervl as intervl
      FROM times
      WHERE startdate + intervl <= @MaxDate
      )
   select startdate, enddate from times;

go

it was an answer to this question, which also has some sample output from it.

Community
  • 1
  • 1
Ron Savage
  • 10,923
  • 4
  • 26
  • 35
  • Ron, thanks. Turns out I didn't need to use it - found a way to not need to know - and besides, I don't work there (or anywhere) at the moment. – John Saunders Jul 05 '09 at 03:18
1

I don't have access to BOL from my phone so this is a rough guide...

First, you need to generate the missing rows for the months you have no data. You can either use a OUTER join to a fixed table or temp table with the timespan you want or from a programmatically created dataset (stored proc or suchlike)

Second, you should look at the new SQL 2008 'analytic' functions, like MAX(value) OVER ( partition clause ) to get the previous value.

(I KNOW Oracle can do this 'cause I needed it to calculate compounded interest calcs between transaction dates - same problem really)

Hope this points you in the right direction...

(Avoid throwing it into a temp table and cursoring over it. Too crude!!!)

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Guy
  • 9,720
  • 7
  • 38
  • 42
1

here's what i came up with

declare @Transactions table (TransactionDate datetime, TransactionValue int)

declare @MinDate datetime
declare @MaxDate datetime
declare @iDate datetime
declare @Month int
declare @count int
declare @i int
declare @PrevLvl int

insert into @Transactions (TransactionDate, TransactionValue)
select '1/1/09',1

insert into @Transactions (TransactionDate, TransactionValue)
select '3/1/09',2

insert into @Transactions (TransactionDate, TransactionValue)
select '5/1/09',3


select @MinDate = min(TransactionDate) from @Transactions
select @MaxDate = max(TransactionDate) from @Transactions

set @count=datediff(mm,@MinDate,@MaxDate)
set @i=1
set @iDate=@MinDate


while (@i<=@count)
begin

    set @iDate=dateadd(mm,1,@iDate)

    if (select count(*) from @Transactions where TransactionDate=@iDate) < 1
    begin

        select @PrevLvl = TransactionValue from @Transactions where TransactionDate=dateadd(mm,-1,@iDate)

        insert into @Transactions (TransactionDate, TransactionValue)
        select @iDate, @prevLvl

    end


    set @i=@i+1
end

select *
from @Transactions
order by TransactionDate
DForck42
  • 19,789
  • 13
  • 59
  • 84
1

To do it in a set-based way, you need sets for all of your data or information. In this case there's the overlooked data of "What months are there?" It's very useful to have a "Calendar" table as well as a "Number" table in databases as utility tables.

Here's a solution using one of these methods. The first bit of code sets up your calendar table. You can fill it using a cursor or manually or whatever and you can limit it to whatever date range is needed for your business (back to 1900-01-01 or just back to 1970-01-01 and as far into the future as you want). You can also add any other columns that are useful for your business.

CREATE TABLE dbo.Calendar
(
     date           DATETIME     NOT NULL,
     is_holiday     BIT          NOT NULL,
     CONSTRAINT PK_Calendar PRIMARY KEY CLUSTERED (date)
)

INSERT INTO dbo.Calendar (date, is_holiday) VALUES ('2009-01-01', 1)  -- New Year
INSERT INTO dbo.Calendar (date, is_holiday) VALUES ('2009-01-02', 1)
...

Now, using this table your question becomes trivial:

SELECT
     CAST(MONTH(date) AS VARCHAR) + '/' + CAST(YEAR(date) AS VARCHAR) AS [Month],
     T1.TransactionValue AS [Value]
FROM
     dbo.Calendar C
LEFT OUTER JOIN dbo.Transactions T1 ON
     T1.TransactionDate <= C.date
LEFT OUTER JOIN dbo.Transactions T2 ON
     T2.TransactionDate > T1.TransactionDate AND
     T2.TransactionDate <= C.date
WHERE
     DAY(C.date) = 1 AND
     T2.TransactionDate IS NULL AND
     C.date BETWEEN '2009-01-01' AND '2009-12-31'  -- You can use whatever range you want
Tom H
  • 46,766
  • 14
  • 87
  • 128
  • Tom, this looks intriguing, but when I tried it, it comes up with duplicate rows - two rows for every input row. – John Saunders Apr 30 '09 at 20:48
  • Not sure what to say. I just copied your code and ran it then copied my code and ran it and I got back 12 rows, one for each month. Make sure that the Calendar table doesn't have duplicates. Otherwise, it's pretty much the same approach as the one that John Gibb gave. – Tom H May 01 '09 at 05:51
  • Maybe it's just too early in the morning, but I don't see the correspondence between your two left outer joins and his outer apply. Can you show me how they are equivalent, or how they differ? – John Saunders May 01 '09 at 10:25
  • Sorry, I meant more that they're the same in how they use a utility table. He uses the Numbers for months 1 to 12 and I use the Calendar with the first of each month. – Tom H May 01 '09 at 13:49
1

John Gibb posted a fine answer, already accepted, but I wanted to expand on it a bit to:

  • eliminate the one year limitation,
  • expose the date range in a more explicit manner, and
  • eliminate the need for a separate numbers table.

This slight variation uses a recursive common table expression to establish the set of Dates representing the first of each month on or after from and to dates defined in DateRange. Note the use of the MAXRECURSION option to prevent a stack overflow (!); adjust as necessary to accommodate the maximum number of months expected. Also, consider adding alternative Dates assembly logic to support weeks, quarters, even day-to-day.

with 
DateRange(FromDate, ToDate) as (
  select 
    Cast('11/1/2008' as DateTime), 
    Cast('2/15/2010' as DateTime)
),
Dates(Date) as (
  select 
    Case Day(FromDate) 
      When 1 Then FromDate
      Else DateAdd(month, 1, DateAdd(month, ((Year(FromDate)-1900)*12)+Month(FromDate)-1, 0))
    End
  from DateRange
  union all
  select DateAdd(month, 1, Date)
  from Dates
  where Date < (select ToDate from DateRange)
)
select 
  d.Date, t.TransactionValue
from Dates d
outer apply (
  select top 1 TransactionValue
  from Transactions
  where TransactionDate <= d.Date
  order by TransactionDate desc
) t
option (maxrecursion 120);
ewbi
  • 1,823
  • 1
  • 12
  • 9
0

-----Alternative way------

select 
    d.firstOfMonth,
    MONTH(d.firstOfMonth) as Mon,
    YEAR(d.firstOfMonth) as Yr, 
    t.TransactionValue
from (
    select 
        dateadd( month, inMonths - 1, '1/1/2009') as firstOfMonth 
        from (
            values (1), (2), (3), (4), (5), (7), (8), (9), (10), (11), (12)
        ) Dates(inMonths)
) d
outer apply (
    select top 1 TransactionValue
    from Transactions
    where TransactionDate <= d.firstOfMonth
    order by TransactionDate desc
) t
Doliveras
  • 1,794
  • 2
  • 14
  • 30
Deepak Shaw
  • 461
  • 3
  • 6