Select rows where column value has changed

Question

Let's say I have the following table:

Value    Time
0        15/06/2012 8:03:43 PM
1        15/06/2012 8:03:43 PM     *
1        15/06/2012 8:03:48 PM 
1        15/06/2012 8:03:53 PM
1        15/06/2012 8:03:58 PM     
2        15/06/2012 8:04:03 PM     *
2        15/06/2012 8:04:08 PM
3        15/06/2012 8:04:13 PM     *
3        15/06/2012 8:04:18 PM
3        15/06/2012 8:04:23 PM
2        15/06/2012 8:04:28 PM     *
2        15/06/2012 8:04:33 PM

How do I select the starred rows, that is, where Value has changed? Basically I'm trying to find the time when Value has changed so I can do other queries based on those time intervals. The solution shouldn't depend on knowing Value or Time in advance.

It seems to me that this shouldn't be very hard (but it's hard enough for me apparently!).

I'm currently using SQL Server 2008 although I have access to 2012 if the new window/analytic functions are helpful.

I tried adapting the solutions here http://blog.sqlauthority.com/2011/11/24/sql-server-solution-to-puzzle-simulate-lead-and-lag-without-using-sql-server-2012-analytic-function/ but my query didn't complete after an hour! I think the joins explode the row size to something unmanageable (or I screwed it up).

I can solve this problem with C# code and multiple db calls, but it seems like something that could be done in a table-valued function or SP which would be much nicer.

Also, a solution that only works with increasing Value is OK if that is easier.

Aaron Bertrand · Accepted Answer · 2012-06-20T20:25:01.340

34

I think this is what you're after:

;WITH x AS
(
  SELECT value, time, rn = ROW_NUMBER() OVER 
  (PARTITION BY Value ORDER BY Time)
  FROM dbo.table
)
SELECT * FROM x WHERE rn = 1;

This may be slow if the resultset is large and there isn't a good supporting index...

EDIT

Ah, wait a second, the values go up and down, not just up... if that is the case you can try this much slower approach:

DECLARE @x TABLE(value INT, [time] DATETIME)

INSERT @x VALUES
(0,'20120615 8:03:43 PM'),--
(1,'20120615 8:03:43 PM'),--*
(1,'20120615 8:03:48 PM'),--
(1,'20120615 8:03:53 PM'),--
(1,'20120615 8:03:58 PM'),--
(2,'20120615 8:04:03 PM'),--*
(2,'20120615 8:04:08 PM'),--
(3,'20120615 8:04:13 PM'),--*
(3,'20120615 8:04:18 PM'),--
(3,'20120615 8:04:23 PM'),--
(2,'20120615 8:04:28 PM'),--*
(2,'20120615 8:04:33 PM');

;WITH x AS
(
  SELECT *, rn = ROW_NUMBER() OVER (ORDER BY time)
  FROM @x
)
SELECT x.value, x.[time]
FROM x LEFT OUTER JOIN x AS y
ON x.rn = y.rn + 1
AND x.value <> y.value
WHERE y.value IS NOT NULL;

Results:

value  time
-----  -----------------------
1      2012-06-15 20:03:43.000
2      2012-06-15 20:04:03.000
3      2012-06-15 20:04:13.000
2      2012-06-15 20:04:28.000

edited Jun 20 '12 at 20:25

answered Jun 20 '12 at 20:18

Aaron Bertrand

272,866
37
466
490

1

Wow that was fast. Value increasing is OK. I'll try it. – agentnega Jun 20 '12 at 20:23
Fantastic, thanks @Aaron Bertrand. On my data the second version took less than double the time of the first version, so the performance difference isn't too bad. – agentnega Jun 20 '12 at 20:40
How to convert this query to also include an ID (new column) in the resulting output? – ThinkCode Sep 10 '15 at 21:20
@ThinkCode you'll need to be more specific about what you want in that column. Right now I could put just about anything in a new column, but that wouldn't be very useful. – Aaron Bertrand Sep 10 '15 at 21:39
@Aaron : http://pastebin.com/LT0ygatC Instead of treating all rows for one ID, we now have 2 different IDs. Hope I made sense! Thank you – ThinkCode Sep 10 '15 at 21:48
@ThinkCode Add PARTITION BY ID to the OVER() clause, then a sprinkling of ORDER BY on the outer query. – Aaron Bertrand Sep 10 '15 at 22:01
Tried your suggestions. http://pastebin.com/mMjzuyUT No go, what am I doing wrong? – ThinkCode Sep 10 '15 at 22:12
@ThinkCode Well, I'm not sure where `x.name` came from. Did you mean `x.ID`? Also, a specific error is usually more useful than "no go", for future reference. With an error message I could have spotted the error in the code without having to reproduce the problem. – Aaron Bertrand Sep 10 '15 at 23:29
Sorry, it was x.ID not x.name. There is no error per say. In pastebin, I provided the output. It is repeating duplicate rows and not the intended result. – ThinkCode Sep 11 '15 at 02:59
@ThinkCode Sorry, I can't follow your narrative and pastebin then. Please ask a separate question. – Aaron Bertrand Sep 11 '15 at 03:06
@AaronBertrand hi aaron if I don't want the last value means 2 2012-06-15 20:04:28.000 I don't want then – Amitesh Oct 24 '17 at 10:07

score 14 · Answer 2 · answered May 29 '14 at 15:12

14

DECLARE @x TABLE(value INT, [time] DATETIME)

INSERT @x VALUES
(0,'20120615 8:03:43 PM'),--
(1,'20120615 8:03:43 PM'),--*
(1,'20120615 8:03:48 PM'),--
(1,'20120615 8:03:53 PM'),--
(1,'20120615 8:03:58 PM'),--
(2,'20120615 8:04:03 PM'),--*
(2,'20120615 8:04:08 PM'),--
(3,'20120615 8:04:13 PM'),--*
(3,'20120615 8:04:18 PM'),--
(3,'20120615 8:04:23 PM'),--
(2,'20120615 8:04:28 PM'),--*
(2,'20120615 8:04:33 PM');


; with temp as
(
SELECT 
    value, [time],  lag(value,1,-1) over (order by [time] ) as lastValue
FROM    @x
) 
SELECT 
    [value],[time] 
FROM 
    temp 
WHERE value <> lastValue

Results:

value   time
---------------------------
0   2012-06-15 20:03:43.000
1   2012-06-15 20:03:43.000
2   2012-06-15 20:04:03.000
3   2012-06-15 20:04:13.000
2   2012-06-15 20:04:28.000

answered May 29 '14 at 15:12

valo

249
2
2

3

Neat, but worth mentioning this is only supported from SQL 2012 onward. – Steve Pettifer May 29 '14 at 15:19
This was super fast! The query for previous versions worked 11 minutes and after that I had to cancel it. – Vojtěch Dohnal Sep 02 '15 at 14:15
LAG() and LEAD() are super-useful window functions - I use them on Presto DB and they save repeated scans through the same dataset - much faster – Rob Potter Jun 12 '20 at 13:55
This was EXACTLY what I needed for one of my queries. Thank you!! – Russ R Dec 01 '22 at 15:30

Chetan Kumar · Answer 3 · 2018-02-22T04:55:28.920

We can do this using sub queries also

SELECT sub1.value, sub1.time FROM 
  (SELECT *,rn,id FROM 
     (SELECT *,row_number() over (partition by value order by time) AS rn, row_number() over (order by time) AS id FROM x ) order by time) sub1
  LEFT OUTER JOIN 
  (SELECT *,rn,id FROM 
     (SELECT *,row_number() over (partition by value order by time) AS rn, row_number() over (order by time) AS id FROM x ) order by time) sub2
  ON sub1.id = sub2.id + 1 
  WHERE sub1.rn - sub2.rn <> 1 OR sub2.rn IS NULL;

So, I have compared the values of 2 rows if it changes then the difference of rn will be not equal to 1 otherwise rn value will increment by 1 so, I have picked all the rows whose difference with next row's rn value is not 1 and sub2.rn IS NULL is used for the first row because the join will occur from id = 2.

Select rows where column value has changed

3 Answers3

Linked

Related