3

Summary:
See below for Details. I'm copying the [unanswered] many-to-many question here to the top for readability:

    Given the "Input" table, what is the SQL to generate the 3rd "Output"
    table (Person_plays_Instrument)?



Current input (1 table):

OriginalTable:
PersonId PersonName Instrument_1 Instrument_2 Instrument_3 MailingAddress HomePhone
--------|----------|------------|------------|------------|--------------|------------
1        Bob        Violin       Viola        Trumpet      someplace      111-111-1111
2        Suzie      Cello        Flute        <null>       otherplace     222-222-2222
3        Jim        Violin       <null>       <null>       thirdplace     333-333-3333

Desired output (3 tables):

Person:
Id Name   MailingAddress HomePhone
--|------|--------------|------------
1  Bob    someplace      111-111-1111
2  Suzie  otherplace     222-222-2222
3  Jim    thirdplace     333-333-3333

Instrument:
Id Name
--|-------
1  Violin
2  Cello
3  Viola
4  Flute
5  Trumpet

Person_plays_Instrument:
PersonId InstrumentId
--------|------------
1        1
1        3
1        5
2        2
2        4
3        1

Details:

I have a single flat SQL table which started out as a spreadsheet. I'd like to normalize it. I'll split this into 1 question for each table.

Questions 1 and 2 have been answered, but I am leaving them in in case others find them helpful.

Questions:

Question #1: [answered]
How do I generate Person table?

Answer #1:
This wonderful post gets me 2/3rds of the way there. For the one-to-many tables, I'm set. Here's the code:

[add autonumber field to OriginalTable, name it PersonId]
[create empty Person table with Id, Name, MailingAddress, HomePhone fields]

INSERT INTO Person (Id, Name, MailingAddress, HomePhone)
  SELECT o.PersonID, o.PersonName, o.MailingAddress, o.HomePhone
  FROM OriginalTable as o
  WHERE o.PersonName Is Not Null;

Question #2: [attempted] (better version by @Branko in Accepted Answer)
How do I generate Instrument table?

Answer #2:
Again, one-to-many. At first, the multiple columns had me stumped.
The solution came in two parts:

  • I'd just have to repeat the INSERT command, once for each column.
  • Using this post and the IN operator, I can check each time to confirm I hadn't already inserted that value.

Here's the code:

[create empty Instrument table with Id[autonumber], Name fields]

INSERT INTO Instrument (Name)
  SELECT Distinct o.Instrument_1
  FROM OriginalTable as o
  WHERE o.Instrument_1 Is Not Null
  AND o.Instrument_1 Not In (SELECT Name from Instrument);

INSERT INTO Instrument (Name)
  SELECT Distinct o.Instrument_2
  FROM OriginalTable as o
  WHERE o.Instrument_2 Is Not Null
  AND o.Instrument_2 Not In (SELECT Name from Instrument);

INSERT INTO Instrument (Name)
  SELECT Distinct o.Instrument_3
  FROM OriginalTable as o
  WHERE o.Instrument_3 Is Not Null
  AND o.Instrument_3 Not In (SELECT Name from Instrument);

Question #3: [unanswered]
How do I generate Person_plays_Instrument table?

Community
  • 1
  • 1
Bukov
  • 650
  • 8
  • 19
  • 1
    Is there `OriginalTable.PersonID`? – Branko Dimitrijevic Feb 07 '13 at 11:21
  • Yes, sorry. I mentioned it in the "[add autonumber field to OriginalTable, name it PersonId]" bit. I actually intentionally removed it from my example because adding the autonumber field is one of the main steps in the answer post I linked to in Q#1. In retrospect it made things more confusing. I'll edit it in for clarity – Bukov Feb 07 '13 at 16:34

1 Answers1

4

Assuming there is OriginalTable.PersonID, which you haven't shown us, but is implied by your own answer #1, the answer #3 can be expressed simply as:

INSERT INTO Person_plays_Instrument (PersonId, InstrumentId)
SELECT PersonID, Instrument.Id
FROM
    OriginalTable
    JOIN Instrument
        ON OriginalTable.Instrument_1 = Instrument.Name
        OR OriginalTable.Instrument_2 = Instrument.Name
        OR OriginalTable.Instrument_3 = Instrument.Name;

BTW, there is a more concise way to express the answer #2:

INSERT INTO Instrument (Name)
    SELECT *
    FROM (
        SELECT o.Instrument_1 I
        FROM OriginalTable as o
        UNION
        SELECT o.Instrument_2
        FROM OriginalTable as o
        UNION
        SELECT o.Instrument_3
        FROM OriginalTable as o
    ) Q
    WHERE I IS NOT NULL;

And here is a fully working SQL Fiddle example for MS SQL Server. Other DBMSes should behave similarly. BTW, you should tag your question appropriately to indicate your DBMS.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • I am in awe of this answer. It works exactly as hoped. In addition, the link to SQL Fiddle and the improved code for my Answer #2 are beautiful. Can't thank you enough. Only wish I had more than one upvote button :) – Bukov Feb 07 '13 at 16:42
  • As a followup: I retagged the way you suggested. I also want to reiterate my thanks for the thoroughness of your SQL Fiddle example. I'm usually a java programmer; still a SQL newbie. I've learned more about SQL from your example in the last 10 minutes than in a week of tutorial sites and googling. Thanks again! – Bukov Feb 07 '13 at 16:50
  • @Bukov You are most welcome! But I still don't see any tags such as [tag:sql-server], [tag:oracle], [tag:postgresql] or [tag:mysql]... – Branko Dimitrijevic Feb 07 '13 at 19:33
  • Ohh, you meant a tag for which *specific* DBMS I was using. At the moment I'm editing in MS Access because I don't have anything better. Eventually, once I've got the schema fixed and the data normalized (using the method you've just helped me with!) I'll be moving everything into SQLite for a Rails project. I'll add Access and SQLite tags to the original question for you :) – Bukov Feb 07 '13 at 20:27
  • May I 2 questions about your more-concise Answer #2: (a) What is the letter "Q" doing right after the brackets? (b) "Where I Is Not Null" looks like it only applies to Instrument_1. Yet magically we're not getting back nulls even though we include Instrument_2 and _3, which have nulls. How is this happening? – Bukov Feb 07 '13 at 22:56
  • 1
    @Bukov (a) This is just a peculiarity of the MS SQL Server syntax - it requires all subqueries to be named (even though that name is not really used anywhere). Other DBMSes might allow you to omit that. – Branko Dimitrijevic Feb 07 '13 at 23:15
  • 1
    @Bukov (b) The WHERE clause exists at the level of the **outer** query, so it applies to all rows returned from the inner query - look at how parenthesis ( and ) are organized. To put it differently: first the inner query UNIONs all rows together, then the outer query filters-out the NULLs from all these rows. You could achieve the same effect by just repeating the WHERE 3 times in the inner query (at which point you could omit the outer query entirely), but why repeat something if you don't have to? – Branko Dimitrijevic Feb 07 '13 at 23:17
  • Thanks again for such helpful answers! I believe understand much better now. (a) Q was just the alias for what was in the brackets. (b) The inner query makes a temporary Table. The table's 1 Column takes it's name from whatever the 1st thing in the Union is. Then the outer table looks at the name you gave that 1 Column and drops out any nulls – Bukov Feb 08 '13 at 00:37