Can I Group By an Unknown Number of Columns?

Question

I am currently trying to re-write a stored procedure to take into account the normalisation of one of our tables. In the original procedure we have two tables:

CREATE TABLE #t_batch
(batch_id   integer, 
thread_group NVARCHAR(60),
dye_code_1  NVARCHAR(10),
dye_conc_1  NUMERIC(19, 7),
dye_code_2  NVARCHAR(10),
dye_conc_2  NUMERIC(19, 7),
dye_code_3  NVARCHAR(10),
dye_conc_3  NUMERIC(19, 7),
dye_code_4  NVARCHAR(10),
dye_conc_4  NUMERIC(19, 7),
dye_code_5  NVARCHAR(10),
dye_conc_5  NUMERIC(19, 7),
dye_code_6  NVARCHAR(10),
dye_conc_6  NUMERIC(19, 7))

CREATE TABLE #t_group
(group_id   INTEGER IDENTITY(1, 1),
dye_code_1  NVARCHAR(10),
dye_conc_1  NUMERIC(19, 7),
dye_code_2  NVARCHAR(10),
dye_conc_2  NUMERIC(19, 7),
dye_code_3  NVARCHAR(10),
dye_conc_3  NUMERIC(19, 7),
dye_code_4  NVARCHAR(10),
dye_conc_4  NUMERIC(19, 7),
dye_code_5  NVARCHAR(10),
dye_conc_5  NUMERIC(19, 7),
dye_code_6  NVARCHAR(10),
dye_conc_6  NUMERIC(19, 7),
thread_group NVARCHAR(60), 
num_batches INTEGER)

After a number of actions #t_batch was populated with a number of records. We then inserted data into #t_group in the following way:

INSERT INTO #t_group
(dye_code_1, dye_conc_1, dye_code_2, dye_conc_2, dye_code_3, dye_conc_3,
dye_code_4, dye_conc_4, dye_code_5, dye_conc_5, dye_code_6, dye_conc_6, 
thread_group, num_batches)
SELECT dye_code_1, dye_conc_1, dye_code_2, dye_conc_2, dye_code_3, dye_conc_3, 
dye_code_4, dye_conc_4, dye_code_5, dye_conc_5, dye_code_6, dye_conc_6, 
thread_group, COUNT(batch_id_fk)
FROM #t_batch
GROUP BY dye_code_1, dye_conc_1, dye_code_2, dye_conc_2, dye_code_3, dye_conc_3, 
dye_code_4, dye_conc_4, dye_code_5, dye_conc_5, dye_code_6, dye_conc_6, 
thread_group
ORDER BY dye_code_1, dye_conc_1, dye_code_2, dye_conc_2, dye_code_3, dye_conc_3, 
dye_code_4, dye_conc_4, dye_code_5, dye_conc_5, dye_code_6, dye_conc_6, 
thread_group

So, we had a series of records that are grouped by the dye columns and a unique group_id for each unique combination of dyes and their concentrations. Also, there is a count of the batch records for each group.

However, since there is in reality no limit to the number of dyes for a batch the tables have been normalised:

CREATE TABLE #t_batch
(batch_id   INTEGER, 
thread_group NVARCHAR(60))

CREATE TABLE #t_batch_dye
(batch_id_fk INTEGER, 
stage   INTEGER,
sequence    INTEGER, 
dye_code    NVARCHAR(10),
dye_conc    NUMERIC(19,7))

CREATE TABLE #t_group
(group_id   INTEGER IDENTITY(1, 1),
thread_group NVARCHAR(60), 
num_batches INTEGER)

CREATE TABLE #t_group_dye
(group_id   INTEGER, 
stage   INTEGER,
sequence    INTEGER,
dye_code    NVARCHAR(10),
dye_conc    NUMERIC(19,7))

Now, my question is: assuming that we have #t_batch and #t_batch_dye populated and that there are a varying number of #t_batch_dye records for each record in #t_batch, how can I insert records into #t_group with a unique group_id for each unique combination of dyes and their concentrations as well as a count of the batches for each group?

Is this something I could use the PIVOT keyword for? The examples I have found on the web all seem to assume that the number of pivoted fields is known in advance.

Many thanks,

David

Glasgow, Scotland

Update:

What I have done is to use a function that returns a concatenated string of codes and concs and used that to group the data.

DECLARE @dyes NVARCHAR(2000)  

 SELECT @dyes = ISNULL(@dyes,'') + dye_code + ' ' + convert(nvarchar,      requested_dye_conc) + ' '
 FROM   #t_batch_dye
 WHERE  batch_id_fk = @batch_id
 ORDER BY dye_code ASC

score 3 · Answer 1 · edited May 23 '17 at 12:18

3

You're correct in assuming that PIVOT and more traditional methods of cross-tab querying expect you to know how many columns you want in advance. At that point, you'll need to use some dynamic SQL to get what you're after:

edited May 23 '17 at 12:18

Community

1
1

answered Oct 04 '12 at 15:05

Tim Lehner

14,813
4
59
76

Thanks for that. I'll investigate the links you posted. – Caledonian Coder Oct 04 '12 at 15:48

score 1 · Accepted Answer · answered Oct 04 '12 at 15:42

1

A partial answer, and not an ideal one: If you know that there will never be more than say 20 dye combinations, you can create another temp table with

select b.thread_group, 
case when d.sequence=1  then d.dye_code end as code1,
case when d.sequence=1  then d.dye_conc end as conc1,
case when d.sequence=2  then d.dye_code end as code2,
case when d.sequence=2  then d.dye_conc end as conc2,
case when d.sequence=3  then d.dye_code end as code3,
case when d.sequence=3  then d.dye_conc end as conc3,
<lots of boring copy&paste...>
case when d.sequence=20 then d.dye_code end as code20,
case when d.sequence=20 then d.dye_conc end as conc20
from #t_batch t, #t_batch_dye d
where t.batch_id  = d.batch_id

and then select your group out of that, using all of code1 to conc20. It's not beautiful, but it's clear. And I know it negates the whole point of normalising your tables out in the first place! Good luck.

answered Oct 04 '12 at 15:42

Vinny Roe

903
4
8

Thanks, Vinny. I'm afraid that the possible combinations are very large indeed! – Caledonian Coder Oct 04 '12 at 15:48
OK. I was working based on your original 6. One other bonkers idea is that you can concatenate a variable selecting from a table (I think - or is this just Sybase?). You have to do it with an 'update' statement. Thus: `update #t_batch_dye set @longstring = @longstring + dye_code + " " +convert(varchar, dye_conc) +" " where batch_id = @nextbatch` That would give you unique string for each. Needs more work... – Vinny Roe Oct 04 '12 at 16:01
This is what I eventually did. I used a function to concatenate the dye codes and concentrations to get a unique string and then grouped by that (see above). Very inelegant! – Caledonian Coder Oct 05 '12 at 08:40

Can I Group By an Unknown Number of Columns?

2 Answers2