1

I am needing to insert some database associations (Many-to-Many table in MS-SQL Server 2008) based off of a Person, and a Language. My incoming data includes a table full of people, a table full of languages, and people have a field of comma delimited languages that represent each language they are associated with.

So here's what the tables would look like

People

-----------------------------------------------
ID | First Name  | Last Name | Languages
-----------------------------------------------
1  | Paul        | Johnson   | English,Spanish
2  | Jack        | Johnson   | English,Hindi
3  | Mark        | Johnson   | NULL
-----------------------------------------------

Language

--------------
ID | Name  
--------------
1  | English  
2  | Spanish 
3  | Hindi
--------------    

And I need to get it into a table, which would, at the completion, look like this:

PeopleLanguage

----------------------
PeopleID | LanguageID
----------------------
1        | 1
1        | 2
2        | 1
2        | 3

Now, I'd like to avoid using cursors or while loops to iterate over each physician, and perform the operation, but I can't think of a way outside of that to accomplish (also, I don't fully understand cursors :p)

Hopefully SO will be able to come up with a good solution for me.

Thanks!

Paul Zaczkowski
  • 2,848
  • 1
  • 25
  • 26

2 Answers2

-1

Assuming the structures above, I was able to accomplish this by joining on a charindex check like so:

INSERT INTO PeopleLanguages (PeopleID, LanguageID)
SELECT P.Id, L.Id
FROM People P
JOIN Language L ON  CHARINDEX(L.Name, P.Languages) > 0
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
drothe
  • 87
  • 3
  • 1
    This appears to work correctly (I haven't tested with an actual insert, but I've got a select showing the correct data)! Note, though, instead of using charindex, I used an IN and a custom Table valued function to split on the comma which looks something like this: ... JOIN Languages ON Languages.Name IN (SELECT DISTINCT [items] AS [LanguageName] FROM [BigSplit](People.Languages,',')) – Paul Zaczkowski Feb 20 '14 at 21:44
  • 1
    [This fiddle](http://sqlfiddle.com/#!3/172c2/1) shows how this simple charindex check can break. – Aaron Bertrand Feb 20 '14 at 21:53
  • 1
    @Paul you should probably post the answer you went with (including the definition of BigSplit) as an answer, and accept that, especially if that's what you went with. Currently this answer is not a very safe solution. – Aaron Bertrand Feb 20 '14 at 21:57
  • Quite right, that seems to defeat it. The IN with the custom table is definitely the way to go. Seems I was a bit to eager to keep myself within the confines of the tables provided. Thanks for pointing this out. :) – drothe Feb 20 '14 at 21:58
-1

Ouch...by seconds...Check the following working code on SQL Fiddle. Similar to the one proposed by drothe.

Wrond code deleted

You could also make some use from this other asnwer and try to use CTE's.

Checking according to Aaron Bertrand comment SQL Fiddle...not correct from my side, needed to include more test and check that CHARINDEX breaks.

Try this solution (working code on SQL Fiddle)

SELECT tbl.personID,l.LangID
FROM LANG l JOIN
(SELECT P.[PersonID],
       LTRIM(RTRIM(n.r.value('.', 'varchar(500)'))) [Language]
FROM   People  AS P
       CROSS APPLY (
        SELECT CAST('<A>' + REPLACE([Languages], ',', '</A><A>') + '</A>' AS XML)
        )          AS S(XMLCol)
       CROSS APPLY S.XMLCol.nodes('/A') AS n(r)) tbl ON l.Name = tbl.[Language]

This SO question and several of its answers give you different approaches, including the one used.

Here you have a long and detailed performance analysis of how to split a string using CLR.

Community
  • 1
  • 1
Yaroslav
  • 6,476
  • 10
  • 48
  • 89
  • 1
    Think you need to be careful (e.g. languages might be `English`, `English (Canadian)`, `English (British)`, `French`, `French (Canada)`, etc). May be safer to buffer with commas – Aaron Bertrand Feb 20 '14 at 21:37
  • Exactly what I had though of as well, @AaronBertrand. See my comment on drothe's answer which explains my solution. Hopefully that's correct, it seems to look right :D – Paul Zaczkowski Feb 20 '14 at 21:46
  • 2
    @PaulZaczkowski Great! Next you should ensure your split function is inline and not multi-statement, or CLR if possible. Some comparisons here: http://www.sqlperformance.com/2012/07/t-sql-queries/split-strings ... you should also consider a different design, IMHO. CSV strings have no place in a database. – Aaron Bertrand Feb 20 '14 at 21:48
  • 1
    Your fiddle is not even close to fair; you've rigged the results! Why did you not include `French` in your languages? Why did you not include a user who had `French (Canada)` but not `French`? [This fiddle](http://sqlfiddle.com/#!3/172c2/1) shows how this answer (and drothe's, currently) breaks. PersonID = 4 should not match LanguageID = 6. – Aaron Bertrand Feb 20 '14 at 21:51
  • Couldn't agree more, @AaronBertrand! Unfortunately I didn't come up with the solution that's been paid for and decided upon already. In a nutshell, we have an existing manual ETL import (all in one big lovely stored procedure :p) and they wanted to add this functionality. Typically I'd have no problem with a CSV in an import, since we usually import through a custom built application (So we can work with the data in memory before loading it into a DB), but that's not the case for this one. :/ – Paul Zaczkowski Feb 20 '14 at 21:53