0

I have a table which contains some denormalized data something like the following (fake to simplify my question):

Book Table

ID, Title, Author-Name, Publisher-Name, Category

In another table I have something like the following:

Author Table

ID, Author-Name, Address

Publisher Table

ID, Publisher-Name, Address

Assume that Publisher-Name and Author-Name are always unique.

Desired Result-Set

Now, all I want to do is create a query that creates a result-set that includes :

 1. all columns from Book table
 2. Author.ID 
 3. Publisher.ID

The final result set would look something like the following:

Result-set Columns

ID, Title, Author-Name, Author.ID as [AuthorID], Publisher-Name, Publisher.ID as [PublisherID] Category

Let's say there are two rows in the Book Table, then the result-set would have two rows which include the appropriate values for the Author.ID and Publisher.ID, because the lookup has been done in the query and returned in the result-set.

What I've Tried : join

I've tried using various joins but I always get more rows than just two rows, because the join seems to join on the Publisher.ID AND the Author.ID and I get 1 row for each of them which ends up giving me 4 rows instead of 2.

Sample Data

Books

1, 'All The Time', 'Fred Smith', 'Big Pub Co.', 'non-fiction'
2, 'Biggest Title Ever', 'John Jones', 'Small Pub Co.', 'fiction'

Authors

100, 'Fred Smith', 'Yukon, AK'
101, 'John Jones', 'Happy, VT'

Publishers

300, 'Big Pub Co', 'Angry, IL'
301, 'Small Pub Co', 'Someplace, IN'

Expected Result-Set

1, 'All The Time', 'Fred Smith', 100, 'Big Pub Co.', 300, 'non-fiction'
2, 'Biggest Title Ever', 'John Jones', 101, 'Small Pub Co.' 301, 'fiction'
D-Shih
  • 44,943
  • 6
  • 31
  • 51
raddevus
  • 8,142
  • 7
  • 66
  • 87
  • 3
    Could you provide some sample data and expect result? that really help – D-Shih Oct 31 '18 at 13:01
  • 1
    Share your query where you've used `join`. – Mayank Porwal Oct 31 '18 at 13:08
  • @D-Shih Added sample data. – raddevus Oct 31 '18 at 13:10
  • missing dots in Publisher ? – pi.314 Oct 31 '18 at 13:11
  • 1
    If it doesn't already exist, you should build out an associative table between books and authors, in case 2 authors with the same name exist, same deal with publishers. – user2366842 Oct 31 '18 at 13:51
  • Constraints are not needed to query. Normalization level doesn't matter. What is necessary & sufficient is what the tables mean. You do not clearly describe what rows you want in the result. Eg talk about "lookup" is vague. Say what rows go in your base tables in terms of a business situation. Then say what rows go in your query result in terms of a business situation and/or base tables. PS Please format your tables with headings. See [mcve]. [Is there any rule of thumb to construct SQL query from a human-readable description?](https://stackoverflow.com/a/33952141/3404097) – philipxy Oct 31 '18 at 19:10
  • @philipxy wait did you see the expected results section? I described all the data. Thanks for all your help. – raddevus Oct 31 '18 at 19:28
  • Here the simplest guesses from column names happen to also be the meanings of the base & query tables because your query is so simple. But you really don't describe your result clearly in text--your text is not clear & adds nothing to the guessing from column names. My comment mentioned both those aspects--meanings & examples-- in saying how to compose/communicate a query specification generally. That is going to matter when your query is not almost trivial. If you find yourself writing "something like" it means you know you haven't been clear. – philipxy Oct 31 '18 at 20:28

3 Answers3

2

You can try JOIN` and get your want columns from those tables.

SELECT 
    b.ID, 
    b.Title,
    b.[Author-Name],
    a.id,
    p.[Publisher-Name],
    p.id,b.Category
FROM Book b 
JOIN Author a on b.[Author-Name] = a.[Author-Name]
JOIN  Publisher p ON b.[Publisher-Name] = p.[Publisher-Name]
D-Shih
  • 44,943
  • 6
  • 31
  • 51
  • 1
    while this is good for the data structure given, it's unfortunately not bulletproof, especially if there's multiple authors or publishers with the same name! – user2366842 Oct 31 '18 at 13:53
  • @user2366842 Very good point. It looks as if I had two problems and one of those was that there was duplicate data in the publisher and author tables which caused multiple rows to be returned in the original query. – raddevus Oct 31 '18 at 14:12
  • @raddevus Could you provide more sample data and expect result let me figure out the problem :) – D-Shih Oct 31 '18 at 14:15
  • @D-Shih Thanks for your help. Feeling a bit bad here because I see that it looks as if the original join query is possibly working but I have duplicate data in sub-tables (join tables) that is causing the duplicate rows in the result-set. – raddevus Oct 31 '18 at 14:20
  • Sorry what do you mean? because It is work in sqlfiddle from your samlpe data https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=c29c25b165335076eef771168060b439 – D-Shih Oct 31 '18 at 14:26
2

So, a few things:

  1. Your data is not denormalized, it is (correctly) normalized.
  2. You have made an odd decision to store author name and publisher name in the books table. This will cause problems if you have two authors or publishers with the same name in their respective tables. If that is guaranteed never to happen (that is, you require a variant of the name in the case of same-name authors or publishers) then you can get rid of the ID column in their respective tables.
  3. The solution to your problem is to use a JOIN as you deduced, but we can't tell where you made your mistake because you did not show what you tried.

All that said, it sounds like you accidentally performed a cartesian JOIN, which is where you join multiple tables but fail to provide the join condition.

So, I suspect you wrote something along the lines of (there are different ways to express this):

FROM Books JOIN Authors JOIN Publishers

and you should have written:

FROM Books INNER JOIN Authors ON Books.Author-Name = Authors.Author-Name
   INNER JOIN Publishers ON Books.Publisher-Name = Authors.Publisher-Name
Larry Lustig
  • 49,320
  • 14
  • 110
  • 160
  • Very good "extra" information. And you are correct, I did the first join (without the inner) and I will try the second one (with the inner) and let you know. – raddevus Oct 31 '18 at 13:42
  • "Normalized" has two broad meanings--normalized to some "1NF" (which itself has no fixed meaning) & normalization to higher NFs. Apparently you don't mean the latter in your first bullet because that would contradict your second bullet. But presumably it's what @raddevus meant when they used it. – philipxy Oct 31 '18 at 19:19
  • @philipxy My use of denormalized was in reference to the string values that are copied into the Book table for author and publisher name instead of depending on the author.Id publisher.id and forcing another lookup. It was a loose interpretation of denormalization mostly in an attempt to point out the oddity of author name/ publisher name. The reason for that data in real solution is so the service can do one query to the Book table and have info it needs without the extra query or join. Thanks – raddevus Oct 31 '18 at 19:26
  • 1
    @raddevus Thanks. It is more helpful to always put in the effort to say what you mean, instead of writing something that doesn't say what you mean. (See my last comment on the question.) – philipxy Oct 31 '18 at 20:41
0

Instead of Inner i would suggest you left join in case there won't be match between tables.

SELECT
    ...
FROM
    BookTable BOOK
    LEFT JOIN
    AuthorTable AT
        ON AT.Author-Name = BOOK.Author-Name
    LEFT JOIN
    PublisherTable PT
        ON PT.Publisher-Name = BOOK.Publisher-Name

However for take sure your rows from Book Table wont be multiplied: use outer apply:

SELECT
    BOOK.ID,
    BOOK.Title,
    BOOK.Author-Name,
    AT.ID AS AuthorID,
    BOOK.Publisher-Name,
    PT.ID AS PublisherID,
    BOOK.Category
FROM
    BookTable BOOK
    OUTER APPLY (SELECT TOP 1 ID FROM AuthorTable AT WHERE AT.Author-Name = BOOK.Author-Name) AT
    OUTER APPLY (SELECT TOP 1 ID FROM PublisherTable PT WHERE PT.Publisher-Name = BOOK.Publisher-Name) PT
pi.314
  • 622
  • 5
  • 16