12

A brief explanation on the relevant domain part:

A Category is composed of four data:

  1. Gender (Male/Female)
  2. Age Division (Mighty Mite to Master)
  3. Belt Color (White to Black)
  4. Weight Division (Rooster to Heavy)

So, Male Adult Black Rooster forms one category. Some combinations may not exist, such as mighty mite black belt.

An Athlete fights Athletes of the same Category, and if he classifies, he fights Athletes of different Weight Divisions (but of the same Gender, Age and Belt).

To the modeling. I have a Category table, already populated with all combinations that exists in the domain.

CREATE TABLE Category (
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [AgeDivision_Id] [int] NULL,
  [Gender] [int] NULL,
  [BeltColor] [int] NULL,
  [WeightDivision] [int] NULL
)

A CategorySet and a CategorySet_Category, which forms a many to many relationship with Category.

CREATE TABLE CategorySet (
  [Id] [int] IDENTITY(1,1) NOT NULL,
  [Championship_Id] [int] NOT NULL,
)

CREATE TABLE CategorySet_Category (
  [CategorySet_Id] [int] NOT NULL,
  [Category_Id] [int] NOT NULL
)

Given the following result set:

   | Options_Id | Championship_Id | AgeDivision_Id | BeltColor | Gender | WeightDivision |
   |------------|-----------------|----------------|-----------|--------|----------------|
1. | 2963       | 422             | 15             | 7         | 0      | 0              |
2. | 2963       | 422             | 15             | 7         | 0      | 1              |
3. | 2963       | 422             | 15             | 7         | 0      | 2              |
4. | 2963       | 422             | 15             | 7         | 0      | 3              |
5. | 2964       | 422             | 15             | 8         | 0      | 0              |
6. | 2964       | 422             | 15             | 8         | 0      | 1              |
7. | 2964       | 422             | 15             | 8         | 0      | 2              |
8. | 2964       | 422             | 15             | 8         | 0      | 3              |

Because athletes may fight two CategorySets, I need CategorySet and CategorySet_Category to be populated in two different ways (it can be two queries):

One Category_Set for each row, with one CategorySet_Category pointing to the corresponding Category.

One Category_Set that groups all WeightDivisions in one CategorySet in the same AgeDivision_Id, BeltColor, Gender. In this example, only BeltColor varies.

So the final result would have a total of 10 CategorySet rows:

| Id | Championship_Id | 
|----|-----------------|
| 1  | 422             |
| 2  | 422             | 
| 3  | 422             |
| 4  | 422             | 
| 5  | 422             | 
| 6  | 422             |
| 7  | 422             |
| 8  | 422             |
| 9  | 422             |  /* groups different Weight Division for BeltColor 7 */
| 10 | 422             |  /* groups different Weight Division for BeltColor 8 */

And CategorySet_Category would have 16 rows:

| CategorySet_Id | Category_Id |
|----------------|-------------|
| 1              | 1           |
| 2              | 2           |
| 3              | 3           |
| 4              | 4           |
| 5              | 5           |
| 6              | 6           |
| 7              | 7           |
| 8              | 8           |
| 9              | 1           | /* groups different Weight Division for BeltColor 7 */
| 9              | 2           | /* groups different Weight Division for BeltColor 7 */
| 9              | 3           | /* groups different Weight Division for BeltColor 7 */
| 9              | 4           | /* groups different Weight Division for BeltColor 7 */
| 10             | 5           | /* groups different Weight Division for BeltColor 8 */
| 10             | 6           | /* groups different Weight Division for BeltColor 8 */
| 10             | 7           | /* groups different Weight Division for BeltColor 8 */
| 10             | 8           | /* groups different Weight Division for BeltColor 8 */

I have no idea how to insert into CategorySet, grab it's generated Id, then use it to insert into CategorySet_Category

I hope I've made my intentions clear.

I've also created a SQLFiddle.

Edit 1: I commented in Jacek's answer that this would run only once, but this is false. It will run a couple of times a week. I have the option to run as SQL Command from C# or a stored procedure. Performance is not crucial.

Edit 2: Jacek suggested using SCOPE_IDENTITY to return the Id. Problem is, SCOPE_IDENTITY returns only the last inserted Id, and I insert more than one row in CategorySet.

Edit 3: Answer to @FutbolFan who asked how the FakeResultSet is retrieved.

It is a table CategoriesOption (Id, Price_Id, MaxAthletesByTeam)

And tables CategoriesOptionBeltColor, CategoriesOptionAgeDivision, CategoriesOptionWeightDivison, CategoriesOptionGender. Those four tables are basically the same (Id, CategoriesOption_Id, Value).

The query look like this:

SELECT * FROM CategoriesOption co
LEFT JOIN CategoriesOptionAgeDivision ON 
    CategoriesOptionAgeDivision.CategoriesOption_Id = co.Id
LEFT JOIN CategoriesOptionBeltColor ON 
    CategoriesOptionBeltColor.CategoriesOption_Id = co.Id
LEFT JOIN CategoriesOptionGender ON 
    CategoriesOptionGender.CategoriesOption_Id = co.Id
LEFT JOIN CategoriesOptionWeightDivision ON 
    CategoriesOptionWeightDivision.CategoriesOption_Id = co.Id
Ortiga
  • 8,455
  • 5
  • 42
  • 71
  • One question: How come you have only 8 rows in your `fakeresultset` and expecting 10 different rows in your `CategorySet` table? I couldn't quite follow the logic behind it. – FutbolFan Aug 03 '15 at 19:08
  • @FutbolFan As I said, it can be two queries. 1 `CategorySet` and 1 `CategorySet_Category` for each row in the fake result set, totaling 8 rows each. And 1 `CategorySet` for each BeltColor, making two more of it and 8 more rows of `CategorySet_Category` – Ortiga Aug 03 '15 at 19:24
  • @FutbolFan Maybe I can write the second one if the first one is provided, despite my lack of skills in SQL – Ortiga Aug 03 '15 at 19:27
  • Okay, let me ask you this then, how are you getting those "fake" result to populate? – FutbolFan Aug 03 '15 at 19:55
  • @FutbolFan see edit #3 – Ortiga Aug 03 '15 at 19:57
  • I have posted what I believe is a working solution for your first requirement. I am working on your second requirement on populating the `categoryset_category` table. Hope that helps! – FutbolFan Aug 03 '15 at 20:16
  • You are saying that `Category` table is already populated. So, we should use existing `IDs` from this table. I don't see them in the "given result set". How are you going to know these `Category.IDs`? The "given result set" should have one more column: `Category.ID`. Once that part is solved, the rest becomes trivial - use the `OUTPUT` clause. – Vladimir Baranov Aug 03 '15 at 23:50
  • `Category` is populated. It is provided in the SQL Fiddle, including Ids. And I have `CategorySet`, not populated, and `CategorySet_Category` which links CategorySets to Categories (many to many). I need to Insert `CategorySet`, grab the generated ids, and use them to insert in the `CategorySet_Category` table – Ortiga Aug 04 '15 at 01:49
  • @VladimirBaranov I've edited my answer to provide an explanation on the domain. They will have to be selected given their combination of `Gender`, `Age Division`, `Belt Color` and `Weight Division`. – Ortiga Aug 04 '15 at 02:28
  • @Andre, in your SQL Fiddle the sample data in `Category` table is confusing. It has IDs 1 and 5 with exactly same set of values in other columns. Same with IDs 2 and 6 and so on. What `category.id` should receive a row from `FakeResultSet` that has `AgeDivision_Id=15`, `Gender=0`, `BeltColor=8`, `WeightDivision=0` ? 1 or 5? if 1, why not 5? if 5, why not 1? – Vladimir Baranov Aug 04 '15 at 02:39
  • @VladimirBaranov sorry, I may have posted the wrong link in one of my edits. It's fixed now, the BeltColor changes – Ortiga Aug 04 '15 at 02:42

5 Answers5

2

The solution described here will work correctly in multi-user environment and when destination tables CategorySet and CategorySet_Category are not empty. I used schema and sample data from your SQL Fiddle.

First part is straight-forward

(ab)use MERGE with OUTPUT clause.

MERGE can INSERT, UPDATE and DELETE rows. In our case we need only to INSERT. 1=0 is always false, so the NOT MATCHED BY TARGET part is always executed. In general, there could be other branches, see docs. WHEN MATCHED is usually used to UPDATE; WHEN NOT MATCHED BY SOURCE is usually used to DELETE, but we don't need them here.

This convoluted form of MERGE is equivalent to simple INSERT, but unlike simple INSERT its OUTPUT clause allows to refer to the columns that we need.

MERGE INTO CategorySet
USING
(
    SELECT
        FakeResultSet.Championship_Id
        ,FakeResultSet.Price_Id
        ,FakeResultSet.MaxAthletesByTeam
        ,Category.Id AS Category_Id
    FROM
        FakeResultSet
        INNER JOIN Category ON
            Category.AgeDivision_Id = FakeResultSet.AgeDivision_Id AND
            Category.Gender = FakeResultSet.Gender AND
            Category.BeltColor = FakeResultSet.BeltColor AND
            Category.WeightDivision = FakeResultSet.WeightDivision
) AS Src
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
INSERT
    (Championship_Id
    ,Price_Id
    ,MaxAthletesByTeam)
VALUES
    (Src.Championship_Id
    ,Src.Price_Id
    ,Src.MaxAthletesByTeam)
OUTPUT inserted.id AS CategorySet_Id, Src.Category_Id 
INTO CategorySet_Category (CategorySet_Id, Category_Id)
;

FakeResultSet is joined with Category to get Category.id for each row of FakeResultSet. It is assumed that Category has unique combinations of AgeDivision_Id, Gender, BeltColor, WeightDivision.

In OUTPUT clause we need columns from both source and destination tables. The OUTPUT clause in simple INSERT statement doesn't provide them, so we use MERGE here that does.

The MERGE query above would insert 8 rows into CategorySet and insert 8 rows into CategorySet_Category using generated IDs.

Second part

needs temporary table. I'll use a table variable to store generated IDs.

DECLARE @T TABLE (
    CategorySet_Id int
    ,AgeDivision_Id int
    ,Gender int
    ,BeltColor int);

We need to remember the generated CategorySet_Id together with the combination of AgeDivision_Id, Gender, BeltColor that caused it.

MERGE INTO CategorySet
USING
(
    SELECT
        FakeResultSet.Championship_Id
        ,FakeResultSet.Price_Id
        ,FakeResultSet.MaxAthletesByTeam
        ,FakeResultSet.AgeDivision_Id
        ,FakeResultSet.Gender
        ,FakeResultSet.BeltColor
    FROM
        FakeResultSet
    GROUP BY
        FakeResultSet.Championship_Id
        ,FakeResultSet.Price_Id
        ,FakeResultSet.MaxAthletesByTeam
        ,FakeResultSet.AgeDivision_Id
        ,FakeResultSet.Gender
        ,FakeResultSet.BeltColor
) AS Src
ON 1 = 0
WHEN NOT MATCHED BY TARGET THEN
INSERT
    (Championship_Id
    ,Price_Id
    ,MaxAthletesByTeam)
VALUES
    (Src.Championship_Id
    ,Src.Price_Id
    ,Src.MaxAthletesByTeam)
OUTPUT
    inserted.id AS CategorySet_Id
    ,Src.AgeDivision_Id
    ,Src.Gender
    ,Src.BeltColor
INTO @T(CategorySet_Id, AgeDivision_Id, Gender, BeltColor)
;

The MERGE above would group FakeResultSet as needed and insert 2 rows into CategorySet and 2 rows into @T.

Then join @T with Category to get Category.IDs:

INSERT INTO CategorySet_Category (CategorySet_Id, Category_Id)
SELECT
    TT.CategorySet_Id
    ,Category.Id AS Category_Id
FROM
    @T AS TT
    INNER JOIN Category ON
        Category.AgeDivision_Id = TT.AgeDivision_Id AND
        Category.Gender = TT.Gender AND
        Category.BeltColor = TT.BeltColor
;

This will insert 8 rows into CategorySet_Category.

Community
  • 1
  • 1
Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
  • One thing I didn't understand, what does 1 = 0 is for? – Ortiga Aug 05 '15 at 18:26
  • 1
    [`MERGE`](https://msdn.microsoft.com/en-us/library/bb510625.aspx) can `INSERT`, `UPDATE` and `DELETE` rows. In our case we need only to `INSERT`. `1=0` is always false, so the `NOT MATCHED BY TARGET` part is always executed. In general, there could be other branches, but we don't need them here. This convoluted form of `MERGE` is equivalent to simple `INSERT`, but unlike simple `INSERT` its `OUTPUT` clause allows to refer to the columns that we need. – Vladimir Baranov Aug 05 '15 at 22:47
1

Here is not the full answer, but direction which you can use to solve this:

1st query:

select row_number() over(order by t, Id)  as n, Championship_Id
from (
select distinct 0 as t, b.Id, a.Championship_Id
from FakeResultSet as a
inner join
Category as b
on
a.AgeDivision_Id=b.AgeDivision_Id and
a.Gender=b.Gender and
a.BeltColor=b.BeltColor and
a.WeightDivision=b.WeightDivision
union all
select distinct 1, BeltColor, Championship_Id
from FakeResultSet
) as q

2nd query:

select q2.CategorySet_Id, c.Id as Category_Id from (
select row_number() over(order by t, Id)  as CategorySet_Id, Id, BeltColor
from (
    select distinct 0 as t, b.Id, null as BeltColor
    from FakeResultSet as a
    inner join
    Category as b
    on
    a.AgeDivision_Id=b.AgeDivision_Id and
    a.Gender=b.Gender and
    a.BeltColor=b.BeltColor and
    a.WeightDivision=b.WeightDivision
    union all
    select distinct 1, BeltColor, BeltColor
    from FakeResultSet
) as q
) as q2
inner join
Category as c
on
(q2.BeltColor is null and q2.Id=c.Id)
OR
(q2.BeltColor = c.BeltColor)

of course this will work only for empty CategorySet and CategorySet_Category tables, but you can use select coalese(max(Id), 0) from CategorySet to get current number and add it to row_number, thus you will get real ID which will be inserted into CategorySet row for second query

Iłya Bursov
  • 23,342
  • 4
  • 33
  • 57
1

What I do when I run into these situations is to create one or many temporary tables with row_number() over clauses giving me identities on the temporary tables. Then I check for the existence of each record in the actual tables, and if they exist update the temporary table with the actual record ids. Finally I run a while exists loop on the temporary table records missing the actual id and insert them one at a time, after the insert I update the temporary table record with the actual ids. This lets you work through all the data in a controlled manner.

SumGuy
  • 602
  • 7
  • 18
0

@@IDENTITY is your friend to the 2nd part of question https://msdn.microsoft.com/en-us/library/ms187342.aspx and Best way to get identity of inserted row?

Some API (drivers) returns int from update() function, i.e. ID if it is "insert". What API/environment do You use?

I don't understand 1st problem. You should not insert identity column.

Community
  • 1
  • 1
Jacek Cz
  • 1,872
  • 1
  • 15
  • 22
  • But how do I get the correct Id? If I insert two rows, and `select SCOPE_IDENTITY()`, it will return the last generated, 2. In the second table, I need to insert data for both 1 and 2. – Ortiga Jul 30 '15 at 20:00
  • This is a one time import script and will be executed MSSQL studio – Ortiga Jul 30 '15 at 20:01
  • 2
    you use the OUTPUT clause. – HLGEM Aug 03 '15 at 17:20
0

Below query will give final result For CategorySet rows:

SELECT 
       ROW_NUMBER () OVER (PARTITION BY  Championship_Id ORDER BY Championship_Id) RNK,
       Championship_Id

FROM
(
    SELECT 
            Championship_Id
            ,BeltColor
    FROM #FakeResultSet
    UNION ALL
    SELECT 
            Championship_Id,BeltColor
    FROM #FakeResultSet
    GROUP BY Championship_Id,BeltColor
)BASE    
wiretext
  • 3,302
  • 14
  • 19
Ashish Kumar
  • 157
  • 1
  • 7