How to select unique subsequences in SQL?

Question

In generic terms I have a sequence of events, from which i'd like to select unique non-repeatable sequences using MS SQL Server 2008 R2.

Specifically in this case, each test has a series of recordings, each of which have a specific sequence of stimuli. I'd like to select the unique sequences of stimuli from inside the recordings of one test, insert them into another table and assign the sequence group id to the original table.

DECLARE @Sequence TABLE
([ID]  INT 
,[TestID]  INT
,[StimulusID] INT
,[RecordingID]  INT
,[PositionInRecording] INT
,[SequenceGroupID] INT
)

INSERT @Sequence
VALUES
(1,  1, 101, 1000, 1, NULL),
(2,  1, 102, 1000, 2, NULL),
(3,  1, 103, 1000, 3, NULL),
(4,  1, 103, 1001, 1, NULL),
(5,  1, 103, 1001, 2, NULL),
(6,  1, 101, 1001, 3, NULL),
(7,  1, 102, 1002, 1, NULL),
(8,  1, 103, 1002, 2, NULL),
(9,  1, 101, 1002, 3, NULL),
(10, 1, 102, 1003, 1, NULL),
(11, 1, 103, 1003, 2, NULL),
(12, 1, 101, 1003, 3, NULL),
(13, 2, 106, 1004, 1, NULL),
(14, 2, 107, 1004, 2, NULL),
(15, 2, 107, 1005, 1, NULL),
(16, 2, 106, 1005, 2, NULL)

After correctly identifying the unique sequences, the results should look like this

DECLARE @SequenceGroup TABLE
([ID]  INT 
,[TestID] INT
,[SequenceGroupName]  NVARCHAR(50)
)

INSERT @SequenceGroup VALUES
(1, 1, '101-102-103'),
(2, 1, '103-103-101'),
(3, 1, '102-103-101'),
(4, 2, '106-107'),
(5, 2, '107-106')


DECLARE @OutcomeSequence TABLE
([ID]  INT 
,[TestID]  INT
,[StimulusID] INT
,[RecordingID]  INT
,[PositionInRecording] INT
,[SequenceGroupID] INT
)

INSERT @OutcomeSequence
VALUES
(1,  1, 101, 1000, 1, 1),
(2,  1, 102, 1000, 2, 1),
(3,  1, 103, 1000, 3, 1),
(4,  1, 103, 1001, 1, 2),
(5,  1, 103, 1001, 2, 2),
(6,  1, 101, 1001, 3, 2),
(7,  1, 102, 1002, 1, 3),
(8,  1, 103, 1002, 2, 3),
(9,  1, 101, 1002, 3, 3),
(10, 1, 102, 1003, 1, 3),
(11, 1, 103, 1003, 2, 3),
(12, 1, 101, 1003, 3, 3),
(13, 2, 106, 1004, 1, 4),
(14, 2, 107, 1004, 2, 4),
(15, 2, 107, 1005, 1, 5),
(16, 2, 106, 1005, 2, 5)

score 1 · Accepted Answer · edited May 23 '17 at 11:55

1

This is fairly easy to do in MySQL and other databases that support some version of GROUP_CONCAT functionality. It's apparently a good deal harder in SQL Server. Here's a stackoverflow question that discusses one technique. Here's another with some information about SQL Server 2008 specific solutions that might also get you started.

edited May 23 '17 at 11:55

Community

1
1

answered Dec 17 '10 at 13:05

Larry Lustig

49,320
14
110
160

score 0 · Answer 2 · answered Dec 17 '10 at 15:36

This will do it. Had to add an column to @SequenceGroup.

DECLARE @Sequence TABLE
([ID]  INT 
,[TestID]  INT
,[StimulusID] INT
,[RecordingID]  INT
,[PositionInRecording] INT
,[SequenceGroupID] INT
)

INSERT @Sequence
VALUES
(1,  1, 101, 1000, 1, NULL),
(2,  1, 102, 1000, 2, NULL),
(3,  1, 103, 1000, 3, NULL),
(4,  1, 103, 1001, 1, NULL),
(5,  1, 103, 1001, 2, NULL),
(6,  1, 101, 1001, 3, NULL),
(7,  1, 102, 1002, 1, NULL),
(8,  1, 103, 1002, 2, NULL),
(9,  1, 101, 1002, 3, NULL),
(10, 1, 102, 1003, 1, NULL),
(11, 1, 103, 1003, 2, NULL),
(12, 1, 101, 1003, 3, NULL),
(13, 2, 106, 1004, 1, NULL),
(14, 2, 107, 1004, 2, NULL),
(15, 2, 107, 1005, 1, NULL),
(16, 2, 106, 1005, 2, NULL)

DECLARE @SequenceGroup TABLE
([ID]  INT IDENTITY(1, 1)
,[TestID] INT
,[SequenceGroupName]  NVARCHAR(50)
,[RecordingID] INT
)

insert into @SequenceGroup
select  TestID, (stuff((select '-' + cast([StimulusID] as nvarchar(100))
from @Sequence t1
where t2.RecordingID = t1.RecordingID
for xml path('')), 1, 1, '')), RecordingID
from @Sequence t2
group by RecordingID, TestID
order by RecordingID

select * from @SequenceGroup

update @Sequence
set SequenceGroupID = sg.ID
from @Sequence s
join @SequenceGroup sg on s.RecordingID=sg.RecordingID and s.TestID=sg.testid

select * from @Sequence

How to select unique subsequences in SQL?

2 Answers2