1

I have a (in my oppinion) tricky SQL problem.

I got a table with subscriptions. Each subscription has an ID and a set of attributes which will change over time. When an attribute value changes a new row is created with the subscription key and the new values – but ONLY for the changed attributes. The values for the attributes that weren’t changed are left empty. It looks something like this (I left out the ValidTo and ValidFrom dates that I use to sort the result correctly):

SubID  Att1  Att2   
 1      J
 1            L 
 1      B
 1            H
 1      A     H

I need to transform this table so I can get the following result:

SubID  Att1  Att2   
 1      J     
 1      J     L
 1      B     L
 1      B     H
 1      A     H

So basically; if an attribute is empty then take the previous value for that attribute. Anything solution goes…. I mean it doesn’t matter what I have to do to get the result: a view on top of the table, an SSIS package to create a new table or something third.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
NicolajB
  • 275
  • 1
  • 6
  • 16
  • Does your RDBMS support windowing functions? –  Dec 06 '12 at 14:01
  • SQL tables are inherently unordered. How do you define the ordering of the rows? – Gordon Linoff Dec 06 '12 at 14:12
  • @GordonLinoff "(I left out the ValidTo and ValidFrom dates that I use to sort the result correctly)" –  Dec 06 '12 at 14:23
  • @Jack Douglas, nope :-( I'm running on a 2008R2 – NicolajB Dec 06 '12 at 14:44
  • 1
    2008R2 does have window functions: **[OVER() clause](http://msdn.microsoft.com/en-us/library/ms189461(v=sql.105).aspx)** (but alas, not `LEAD()` or `LAG()`) – ypercubeᵀᴹ Dec 06 '12 at 14:54
  • @ypercube and even 2012 [does not support `ignore nulls`](http://msdn.microsoft.com/en-us/library/hh231256.aspx) in `lag`/`lead` –  Dec 06 '12 at 14:59
  • Thanks for saying NicolajB: I'll delete my answer as it doesn't help on SQL Server –  Dec 06 '12 at 15:06

6 Answers6

1

Assuming (based on the fact that you mentioned SSIS) you can use OUTER APPLY to get the previous row:

DECLARE @T TABLE (SubID INT, Att1 CHAR(1), Att2 CHAR(2), ValidFrom DATETIME);
INSERT @T VALUES
    (1, 'J', '', '20121201'),
    (1, '', 'l', '20121202'),
    (1, 'B', '', '20121203'),
    (1, '', 'H', '20121204'),
    (1, 'A', 'H', '20121205');

SELECT  T.SubID,
        Att1 = COALESCE(NULLIF(T.att1, ''), prev.Att1, ''),
        Att2 = COALESCE(NULLIF(T.att2, ''), prev.Att2, '')
FROM    @T T
        OUTER APPLY
        (   SELECT  TOP 1 Att1, Att2
            FROM    @T prev
            WHERE   prev.SubID = T.SubID
            AND     prev.ValidFrom < t.ValidFrom
            ORDER BY ValidFrom DESC
        ) prev
ORDER BY T.ValidFrom;

(I've had to add random values for ValidFrom to ensure the order by is correct)

EDIT

The above won't work if you have multiple consecutive rows with blank values - e.g.

DECLARE @T TABLE (SubID INT, Att1 CHAR(1), Att2 CHAR(2), ValidFrom DATETIME);
INSERT @T VALUES
    (1, 'J', '', '20121201'),
    (1, '', 'l', '20121202'),
    (1, 'B', '', '20121203'),
    (1, '', 'H', '20121204'),
    (1, '', 'J', '20121205'),
    (1, 'A', 'H', '20121206');

If this is likely to happen you will need two OUTER APPLYs:

SELECT  T.SubID,
        Att1 = COALESCE(NULLIF(T.att1, ''), prevAtt1.Att1, ''),
        Att2 = COALESCE(NULLIF(T.att2, ''), prevAtt2.Att2, '')
FROM    @T T
        OUTER APPLY
        (   SELECT  TOP 1 Att1
            FROM    @T prev
            WHERE   prev.SubID = T.SubID
            AND     prev.ValidFrom < t.ValidFrom
            AND     COALESCE(prev.Att1 , '') != ''
            ORDER BY ValidFrom DESC
        ) prevAtt1
        OUTER APPLY
        (   SELECT  TOP 1 Att2
            FROM    @T prev
            WHERE   prev.SubID = T.SubID
            AND     prev.ValidFrom < t.ValidFrom
            AND     COALESCE(prev.Att2 , '') != ''
            ORDER BY ValidFrom DESC
        ) prevAtt2
ORDER BY T.ValidFrom;

However, since each OUTER APPLY is only returning one value I would change this to a correlated subquery, since the above will evaluate PrevAtt1.Att1 and `PrevAtt2.Att2' for every row whether required or not. However if you change this to:

SELECT  T.SubID,
        Att1 = COALESCE(
                    NULLIF(T.att1, ''), 
                    (   SELECT  TOP 1 Att1
                        FROM    @T prev
                        WHERE   prev.SubID = T.SubID
                        AND     prev.ValidFrom < t.ValidFrom
                        AND     COALESCE(prev.Att1 , '') != ''
                        ORDER BY ValidFrom DESC
                    ), ''),
        Att2 = COALESCE(
                    NULLIF(T.att2, ''), 
                    (   SELECT  TOP 1 Att2
                        FROM    @T prev
                        WHERE   prev.SubID = T.SubID
                        AND     prev.ValidFrom < t.ValidFrom
                        AND     COALESCE(prev.Att2 , '') != ''
                        ORDER BY ValidFrom DESC
                    ), '')
FROM    @T T
ORDER BY T.ValidFrom;

The subquery will only evaluate when required (ie. when Att1 or Att2 is blank) rather than for every row. The execution plan does not show this, and in fact the "Actual Execution Plan" of the latter appears more intensive it almost certainly won't be. But as always, the key is testing, run both on your data and see which performs the best, and check the IO statistics for reads etc.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
GarethD
  • 68,045
  • 10
  • 83
  • 123
1

this one works in oracle 11g

  select  SUBID
         ,NVL(ATT1,LAG(ATT1) over(order by ValidTo)) ATT1
         ,NVL(ATT2,lag(ATT2) over(order by ValidTo)) ATT2 
  from table_name

i agree with Gordon Linoff and Jack Douglas.this code has limitation as when multiple records with nulls are inserted.. but below code will handle that..

select SUBID
      ,NVL(ATT1,LAG(ATT1 ignore nulls) over(order by VALIDTO)) ATT1
      ,NVL(ATT2,LAG(ATT2 ignore nulls) over(order by VALIDTO)) ATT2
from Table_name

please see sql fiddle http://sqlfiddle.com/#!4/3b530/4

GKV
  • 501
  • 4
  • 8
1

You can do this with a correlated subquery:

select t.subid,
       (select t2.att1 from t t2 where t2.rowid <= t.rowid and t2.att1 is not null order by rowid desc limit 1) as att1,
       (select t2.att2 from t t2 where t2.rowid <= t.rowid and t2.att2 is not null order by rowid desc limit 1) as att1
from t

This assumes that you have a rowid or equivalent (such as date time created) that specifies the ordering of the rows. It also uses limit to limit the results. In other databases, this might use top instead. (And Oracle uses a slightly more complex expression.)

I would write this using ValidTo. However, because there is ValidTo and ValidFrom, the actual expression is much more complicated. I would need for the question to clarify the rules for using these values with respect to imputing values at other times.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Good point about ValidTo/ValidFrom, others are like me assuming that the data is good and there are no gaps/overlaps I think. –  Dec 06 '12 at 15:00
0
with Tricky1 as (
    Select SubID, Att1, Att2, row_number() over(order by ValidFrom) As rownum 
    From Tricky
)
select T1.SubID, T1.Att1, T2.Att2
from Tricky1 T1
cross join Tricky1 T2
where (ABS(T1.rownum-T2.rownum) = 1 or (T1.rownum = 1 and T2.rownum = 1))
and T1.Att1 is not null
;

Also, have a look at accessing previous value, when SQL has no notion of previous value, here.

Community
  • 1
  • 1
user1826905
  • 558
  • 3
  • 9
0

I never touched SQL Server, but I read that it supports analytical functions just like Oracle.

> select * from MYTABLE order by ValidFrom;

     SUBID A A VALIDFROM
---------- - - -------------------
         1 J   2012-12-06 15:14:51
         2 j   2012-12-06 15:15:20
         1   L 2012-12-06 15:15:31
         2   l 2012-12-06 15:15:39
         1 B   2012-12-06 15:15:48
         2 b   2012-12-06 15:15:55
         1   H 2012-12-06 15:16:03
         2   h 2012-12-06 15:16:09
         1 A H 2012-12-06 15:16:20
         2 a h 2012-12-06 15:16:29


select
  t.SubID
 ,last_value(t.Att1 ignore nulls)over(partition by t.SubID order by t.ValidFrom rows between unbounded preceding and current row) as Att1
 ,last_value(t.Att2 ignore nulls)over(partition by t.SubID order by t.ValidFrom rows between unbounded preceding and current row) as Att2
 ,t.ValidFrom
from MYTABLE t;

     SUBID A A VALIDFROM
---------- - - -------------------
         1 J   2012-12-06 15:45:33
         1 J L 2012-12-06 15:45:41
         1 B L 2012-12-06 15:45:49
         1 B H 2012-12-06 15:45:58
         1 A H 2012-12-06 15:46:06
         2 j   2012-12-06 15:45:38
         2 j l 2012-12-06 15:45:44
         2 b l 2012-12-06 15:45:53
         2 b h 2012-12-06 15:46:02
         2 a h 2012-12-06 15:46:09
colemar
  • 125
  • 4
  • I'm afraid I don't think it does: http://msdn.microsoft.com/en-us/library/hh231517.aspx –  Dec 06 '12 at 14:57
0

I was at it for quite a while now. I found a rather simple way of doing it. Not the best solution as such as i know there must be other way, but here it goes.

I had to consolidates duplicates too and in 2008R2.

So if you can try to create a table which contains one set of duplicates records.

According to your example create one table where 'ATT1' is blank. Then use Update queries with Inner join on 'SubId' to populate the data that you need

ankyskywalker
  • 1,462
  • 2
  • 9
  • 20