0

I have a Staging table with the following columns

  • VideoURL
  • Rating
  • Length
  • Thumbnail
  • Tags (comma-delimited)

Videos have a many to many relationship with the tags. I've created the following new tables:

  • Video
  • Tag
  • VideoTag

How do I parse the data in the Staging table into the three new tables? The Tag table should contain no duplicates. Also, I need to do a little formatting on the data before it is inserted into the new tables e.g. need to strip all letters from the Length column.

BrazenTongue
  • 179
  • 1
  • 9
  • 2
    I'm confused please clarify - Is this purely in SQL? In which case you should remove the ef and c# tags. – Preet Sangha Sep 02 '12 at 00:48
  • Oops, forgot to take those off. Actually, I'm interested in either a SQL or Entity Framework solution, but figured EF ultimately couldn't come close to the performance of straight SQL. – BrazenTongue Sep 02 '12 at 02:14

3 Answers3

1

How I would do it.

  1. Use this question to parse the CSV column into multiple rows another table with an artifical key to link the old table to the new one
  2. join the the two tables with another generated artificial key (the Many many join table primary key) column to create the many-many table
  3. fix up both ends of the many-many relationship
Community
  • 1
  • 1
Preet Sangha
  • 64,563
  • 18
  • 145
  • 216
1

Using an auxiliary numbers table, you can split the tags column into rows while keeping it associated with the VideoURL:

CREATE TABLE NumberPivot (NumberID INT PRIMARY KEY)
DECLARE @intLoopCounter INT
SELECT @intLoopCounter =0
SET NOCOUNT ON

WHILE @intLoopCounter <=999 BEGIN
   INSERT INTO NumberPivot
   VALUES (@intLoopCounter)
   SELECT @intLoopCounter = @intLoopCounter +1
END
GO


SELECT
  ContentPageID,
  Substring(',' + Tags + ','
            , numberID + 1
            , Charindex(',', ',' + Tags + ',', numberID + 1) - numberid - 1) AS value 
FROM   dbo.NumberPivot  AS np,
       Staging AS S
WHERE  numberid <= Len(',' + Tags + ',') - 1
   AND Substring(',' + Tags + ',', numberID, 1) = ',' 

So here we fill the Tags table with unique tags:

;WITH X AS (
SELECT
  VideoURL,
  Substring(',' + Tags + ',', numberID + 1, Charindex(',', ',' + Tags + ',', numberID + 1) - numberid - 1) AS Tag
FROM   dbo.NumberPivot  AS np,
       Staging AS S
WHERE  numberid                                      <= Len(',' + Tags + ',') - 1
   AND Substring(',' + Tags + ',', numberID, 1) = ',' 
)
INSERT Tag (Tag)
SELECT DISTINCT Tag FROM X;

Next fill the Videos table:

INSERT Video (VideoURL, Rating, Length, Thumbnail)
SELECT VideoURL, Rating, Length, Thumbnail
FROM Staging;

Finally fill VideoTag:

INSERT VideoTag (VideoURL, Tag)
SELECT
  VideoURL,
  Substring(',' + Tags + ',', numberID + 1, Charindex(',', ',' + Tags + ',', numberID + 1) - numberid - 1) AS Tag
FROM   dbo.NumberPivot  AS np,
       Staging AS S
WHERE  numberid                                      <= Len(',' + Tags + ',') - 1
   AND Substring(',' + Tags + ',', numberID, 1) = ',' 

Got the split string using number table from here

Ross Presser
  • 6,027
  • 1
  • 34
  • 66
1

try this , assuming videourl as video name and splitstring function return table has tagname column

create table video(...)
create table tag(...)
create table videotag(...)


insert video
select distinct(maintable.videourl) as videoname
from   maintable 


insert tag
select distinct(tag.tagname)
from   maintable cross apply SplitString(maintable.tags,',') tag

insert videotag
select maintable.videourl as videoname,tag.tagname
from   maintable   cross apply SplitString(maintable.tags,',') tag

If you auto generated ids from video and tag table then
while inserting into videotag table get associated auto generated id from
it's master table.

Splitstring function is from here

Community
  • 1
  • 1
Sankara
  • 1,469
  • 15
  • 22