4

I have a table called crewWork as follows :

CREATE TABLE crewWork( 
       FloorNumber int, AptNumber int, WorkType int, simTime int )

After the table was populated, I need to know how many times a change in apt occurred and how many times a change in floor occurred. Usually I expect to find 10 rows on each apt and 40-50 on each floor. I could just write a scalar function for that, but I was wondering if there's any way to do that in t-SQL without having to write scalar functions.

Thanks

The data will look like this:

FloorNumber  AptNumber    WorkType      simTime  
1            1            12            10  
1            1            12            25  
1            1            13            35  
1            1            13            47  
1            2            12            52  
1            2            12            59  
1            2            13            68  
1            1            14            75  
1            4            12            79  
1            4            12            89  
1            4            13            92  
1            4            14            105  
1            3            12            115  
1            3            13            129  
1            3            14            138  
2            1            12            142  
2            1            12            150  
2            1            14            168  
2            1            14            171  
2            3            12            180  
2            3            13            190  
2            3            13            200  
2            3            14            205  
3            3            14            216  
3            4            12            228  
3            4            12            231  
3            4            14            249  
3            4            13            260  
3            1            12            280  
3            1            13            295  
2            1            14            315  
2            2            12            328  
2            2            14            346  

I need the information for a report, I don't need to store it anywhere.

b3bel
  • 375
  • 1
  • 5
  • 15
  • Can you add sample data, input and output please? – gbn Dec 29 '11 at 09:59
  • 1
    How is a change represented? As a new row? (If you aren't storing it somewhere not sure how you expect scalar functions to help...) – Martin Smith Dec 29 '11 at 10:00
  • 1
    possible duplicate of [How to track number of changes occured in a column? T-SQL - SQL Server](http://stackoverflow.com/questions/2736934/how-to-track-number-of-changes-occured-in-a-column-t-sql-sql-server) – Lukas Eder Dec 29 '11 at 10:01
  • 1
    @LukasEder: probably not a duplicate, yet... – gbn Dec 29 '11 at 10:07
  • 2
    @b3bel: please update the question,don't add as a comment... – gbn Dec 29 '11 at 10:11
  • And what is the desired output for this please? – gbn Dec 29 '11 at 10:38

4 Answers4

11

If you use the accepted answer as written now (1/6/2023), you get correct results with the OP dataset, but I think you can get wrong results with other data.

CONFIRMED: ACCEPTED ANSWER HAS A MISTAKE (as of 1/6/2023)

I explain the potential for wrong results in my comments on the accepted answer.

In this db<>fiddle, I demonstrate the wrong results. I use a slightly modified form of accepted answer (my syntax works in SQL Server and PostgreSQL). I use a slightly modified form of the OP's data (I change two rows). I demonstrate how the accepted answer can be changed slightly, to produce correct results.

The accepted answer is clever but needs a small change to produce correct results (as demonstrated in the above db<>fiddle and described here:

  • Instead of doing this as seen in the accepted answer COUNT(DISTINCT AptGroup)...
  • You should do thisCOUNT(DISTINCT CONCAT(AptGroup, '_', AptNumber))...

DDL:


SELECT * INTO crewWork  FROM (VALUES
-- data from question, with a couple changes to demonstrate problems with the accepted answer
-- https://stackoverflow.com/q/8666295/1175496
--FloorNumber  AptNumber    WorkType      simTime  
(1,            1,            12,            10 ),
-- (1,            1,            12,            25 ), -- original
(2,            1,            12,            25 ), -- new, changing FloorNumber 1->2->1 
(1,            1,            13,            35 ),
(1,            1,            13,            47 ),
(1,            2,            12,            52 ),
(1,            2,            12,            59 ),
(1,            2,            13,            68 ),
(1,            1,            14,            75 ),
(1,            4,            12,            79 ),
-- (1,            4,            12,            89 ), -- original
(1,            1,            12,            89 ), -- new , changing  AptNumber 4->1->4)
(1,            4,            13,            92 ),
(1,            4,            14,            105 ),
(1,            3,            12,            115 ),
...

DML:

;
WITH groupedWithConcats as (SELECT
 *,
 CONCAT(AptGroup,'_', AptNumber) as AptCombo,
 CONCAT(FloorGroup,'_',FloorNumber) as FloorCombo
 -- SQL SERVER doesnt have TEMPORARY keyword; Postgres doesn't understand # for temp tables
 -- INTO TEMPORARY groupedWithConcats
 FROM
 ( 
   SELECT 
     -- the columns shown in Andriy's answer:
     -- https://stackoverflow.com/a/8667477/1175496
     ROW_NUMBER() OVER (                            ORDER BY simTime)   as RN,
     -- AptNumber   
     AptNumber,
     ROW_NUMBER() OVER (PARTITION BY AptNumber      ORDER BY simTime)   as RN_Apt,
     ROW_NUMBER() OVER (                            ORDER BY simTime)
   - ROW_NUMBER() OVER (PARTITION BY AptNumber      ORDER BY simTime)   as AptGroup,

     -- FloorNumber   
     FloorNumber,
     ROW_NUMBER() OVER (PARTITION BY FloorNumber    ORDER BY simTime)   as RN_Floor,
     ROW_NUMBER() OVER (                            ORDER BY simTime)
   - ROW_NUMBER() OVER (PARTITION BY FloorNumber    ORDER BY simTime)   as FloorGroup
  FROM crewWork
 )  grouped
)
-- if you want to see how the groupings work:
-- SELECT * FROM groupedWithConcats
-- otherwise just run this query to see the counts of "changes":
SELECT 
 COUNT(DISTINCT AptCombo)-1     as CountAptChangesWithConcat_Correct,
 COUNT(DISTINCT AptGroup)-1     as CountAptChangesWithoutConcat_Wrong,
 COUNT(DISTINCT FloorCombo)-1   as CountFloorChangesWithConcat_Correct,
 COUNT(DISTINCT FloorGroup)-1   as CountFloorChangesWithoutConcat_Wrong
FROM groupedWithConcats;  

ALTERNATIVE ANSWER

The accepted-answer may eventually get updated to remove the mistake. If that happens I can remove my warning but I still want leave you with this alternative way to produce the answer.

My approach goes like this: "check the previous row, if the value is different in previous row vs current row, then there is a change". SQL doesn't have idea or row order functions per se (at least not like in Excel for example; )

Instead, SQL has window functions. With SQL's window functions, you can use the window function RANK plus a self-JOIN technique as seen here to combine current row values and previous row values so you can compare them. Here is a db<>fiddle showing my approach, which I pasted below.

The intermediate table, showing the columns which has a value 1 if there is a change, 0 otherwise (i.e. FloorChange, AptChange), is shown at the bottom of the post...

DDL:

...same as above...

DML:

;
WITH rowNumbered AS (
  SELECT
     *,
     ROW_NUMBER() OVER ( 
         ORDER BY simTime)  as RN
  FROM crewWork
)
,joinedOnItself AS (
  SELECT 
     rowNumbered.*,
     rowNumberedRowShift.FloorNumber as FloorShift,
     rowNumberedRowShift.AptNumber as AptShift,
     CASE WHEN rowNumbered.FloorNumber <> rowNumberedRowShift.FloorNumber THEN 1 ELSE 0 END     as FloorChange,
     CASE WHEN rowNumbered.AptNumber <> rowNumberedRowShift.AptNumber THEN 1 ELSE 0 END         as AptChange
  
  FROM  rowNumbered
  LEFT OUTER JOIN rowNumbered as rowNumberedRowShift
  ON rowNumbered.RN = (rowNumberedRowShift.RN+1)
)
-- if you want to see:
-- SELECT * FROM joinedOnItself;
SELECT 
  SUM(FloorChange) as FloorChanges, 
  SUM(AptChange) as AptChanges
FROM joinedOnItself;

Below see the first few rows of the intermediate table (joinedOnItself). This shows how my approach works. Note the last two columns, which have a value of 1 when there is a change in FloorNumber compared to FloorShift (noted in FloorChange), or a change in AptNumber compared to AptShift (noted in AptChange).

floornumber aptnumber worktype simtime rn floorshift aptshift floorchange aptchange
1 1 12 10 1 0 0
2 1 12 25 2 1 1 1 0
1 1 13 35 3 2 1 1 0
1 1 13 47 4 1 1 0 0
1 2 12 52 5 1 1 0 1
1 2 12 59 6 1 2 0 0
1 2 13 68 7 1 2 0 0

Note instead of using the window function RANK and JOIN, you could use the window function LAG to compare values in the current row to the previous row directly (no need to JOIN). I don't have that solution here, but it is described in the Wikipedia article example:

Window functions allow access to data in the records right before and after the current record.

Nate Anderson
  • 18,334
  • 18
  • 100
  • 135
  • 3
    **You are correct.** The row number difference should be an *additional* grouping/partitioning criterion, not the only one, in order to produce the correct number of "islands". Around the time I posted my answer, I kept forgetting that. Either way, I'm glad you've not only caught this issue but posted the correct solution. Now to try and find a way to promote it so that it gets the votes it deserves... – Andriy M Jan 10 '23 at 11:25
  • 2
    Thank you @AndriyM, and for your clever solution in the first place which "got me thinking" – Nate Anderson Jan 10 '23 at 17:01
7

If I am not missing anything, you could use the following method to find the number of changes:

  • determine groups of sequential rows with identical values;

  • count those groups;

  • subtract 1.

Apply the method individually for AptNumber and for FloorNumber.

The groups could be determined like in this answer, only there's isn't a Seq column in your case. Instead, another ROW_NUMBER() expression could be used. Here's an approximate solution:

;
WITH marked AS (
  SELECT
    FloorGroup = ROW_NUMBER() OVER (                         ORDER BY simTime)
               - ROW_NUMBER() OVER (PARTITION BY FloorNumber ORDER BY simTime),

    AptGroup   = ROW_NUMBER() OVER (                         ORDER BY simTime)
               - ROW_NUMBER() OVER (PARTITION BY AptNumber   ORDER BY simTime)
  FROM crewWork
)
SELECT
  FloorChanges = COUNT(DISTINCT FloorGroup) - 1,
  AptChanges   = COUNT(DISTINCT AptGroup)   - 1
FROM marked

(I'm assuming here that the simTime column defines the timeline of changes.)


UPDATE

Below is a table that shows how the distinct groups are obtained for AptNumber.

AptNumber  RN  RN_Apt  AptGroup (= RN - RN_Apt)
---------  --  ------  ---------
1          1   1       0
1          2   2       0
1          3   3       0
1          4   4       0
2          5   1       4
2          6   2       4
2          7   3       4
1          8   5   =>  3
4          9   1       8
4          10  2       8
4          11  3       8
4          12  4       8
3          13  1       12
3          14  2       12
3          15  3       12
1          16  6       10
…          …   …       …

Here RN is a pseudo-column that stands for ROW_NUMBER() OVER (ORDER BY simTime). You can see that this is just a sequence of rankings starting from 1.

Another pseudo-column, RN_Apt contains values produces by the other ROW_NUMBER, namely ROW_NUMBER() OVER (PARTITION BY AptNumber ORDER BY simTime). It contains rankings within individual groups of identical AptNumber values. You can see that, for a newly encountered value, the sequence starts over, and for a recurring one, it continues where it stopped last time.

You can also see from the table that if we subtract RN from RN_Apt (could be the other way round, doesn't matter in this situation), we get the value that uniquely identifies every distinct group of same AptNumber values. You might as well call that value a group ID.

So, now that we've got these IDs, it only remains for us to count them (count distinct values, of course). That will be the number of groups, and the number of changes is one less (assuming the first group is not counted as a change).

Nate Anderson
  • 18,334
  • 18
  • 100
  • 135
Andriy M
  • 76,112
  • 17
  • 94
  • 154
  • 1
    can you elaborate on how you get the AptGroup, for instance? – b3bel Dec 29 '11 at 14:50
  • 1
    AptGroup should take into consideration the floor as well, as Apartment #1 on floor #1 is different that Apartment #1 on floor #2 – Gabriele Petrioli Dec 29 '11 at 15:29
  • 1
    @Gaby aka G. Petrioli: Actually, I might have got things wrong to a greater extent than that. After your comment I re-read the question, and it now seems to me that I'm counting entirely wrong things here... (Thank you, by the way!) – Andriy M Dec 29 '11 at 15:40
  • 1
    @AndriyM your approach works great. I've taken Gaby aka G. Petroli into consideration and change the table from apt number to aptID, since each apt has its ID independent of which floor it is in. Thanks. – b3bel Jan 04 '12 at 11:14
  • 1
    @AndriyM I love this technique, and apparently your approach works great for the OP :) But what about this case, start with your table, now where `RN` = **10**, imagine `AptNumber` had a the value **1** (instead of your 4). That would mean `RN_Apt` has the value **6** (right?). Therefore `Apt_Group` has the value 10-6 = **4** (right?) Doesn't this collide with the value of `Apt_Group` = **4** which was *already* used for previous rows (where `RN` is 5, 6, 7 -- and the `AptNumber` was 2?) So `COUNT(DISTINCT AptGroup)` would not increase? Maybe I'm mistaken; I should make a db<>fiddle. Thanks – Nate Anderson Jan 06 '23 at 02:52
  • 1
    Well I *think* I'm right, collisions are possible: Run [this fiddle](https://www.db-fiddle.com/f/nChGDyLhXeyznEBMbxG2KT/13) once as-is, you get Andriy's results. But uncomment the UPDATE statement in the fiddle (which makes the change I describe) and run again, you get "wrong" results (no increase in count of changes, even though you added two changes) -- maybe I'm missing something; Andriy's other answer says this technique [*avoids collision*](https://stackoverflow.com/questions/5087864/sql-query-for-grouping-the-results-based-on-sequence/5088487#comment5701830_5088487) – Nate Anderson Jan 06 '23 at 04:24
  • 1
    OK, the other answer [also has collision in its `SeqGroup` column](https://www.db-fiddle.com/f/2R2xs95nv4b8ZWmcrySCkd/2), but the collision is avoided because you `GROUP BY ( SeqGroup, ID )` **both** `SeqGroup` **and** `ID`. A similar approach would "fix" this answer, `COUNT(DISTINCT AptGroup, AptNumber)` using [some technique](https://stackoverflow.com/questions/1471250/counting-distinct-over-multiple-columns) to count distinct over multiple columns; [here I use `CONCAT`](https://www.db-fiddle.com/f/4g1CjgdaWF5xEdhGEHDHsi/0) – Nate Anderson Jan 06 '23 at 04:43
  • 1
    @TheRedPea: Your remarks are correct. And thanks for the column name fix. – Andriy M Jan 10 '23 at 11:23
2

add an extra column changecount

CREATE TABLE crewWork( 
       FloorNumber int, AptNumber int, WorkType int, simTime int ,changecount int)

increment changecount value for each updation

if want to know count for each field then add columns corresponding to it for changecount

Nighil
  • 4,099
  • 7
  • 30
  • 56
1

Assuming that each record represents a different change, you can find changes per floor by:

select FloorNumber, count(*)
from crewWork
group by FloorNumber

And changes per apartment (assuming AptNumber uniquely identifies apartment) by:

select AptNumber, count(*)
from crewWork
group by AptNumber

Or (assuming AptNumber and FloorNumber together uniquely identifies apartment) by:

select FloorNumber, AptNumber, count(*)
from crewWork
group by FloorNumber, AptNumber