3

I have two tables

Table: color_document
+----------+---------------------+
| color_id |    document_id      |
+----------+---------------------+
|   180907 | 4270851             |
|   180954 | 4270851             |
+----------+---------------------+

Table: color_group
+----------------+-----------+
| color_group_id | color_id  |
+----------------+-----------+
|              3 | 180954    |
|              4 | 180907    |
|             11 | 180907    |
|             11 | 180984    |
|             12 | 180907    |
|             12 | 180954    |
+----------------+-----------+

Is it possible for a query to get a result that looks something like this using multiple color id's to join the two tables?

Result
+----------------+--------------+
| color_group_id | document_id  |
+----------------+--------------+
|             12 | 4270851      |
+----------------+--------------+

Since Color Group 12 is the only group that has the exact same set of Colors that Document 4270851 has.

I've got some bad data that i'm being forced to work with so I've had to manufacture the color groups by finding each unique set of color_id's associated with document_id's. I'm trying to then create a new relationship directly between my manufactured color groups and documents.

I know I could probably do something with a GROUP_CONCAT to make a pseudo key of concatenated color ids, but I'm trying to find a solution that would also work in, say, Oracle. Am I barking up the completely wrong tree with this logic?

My ultimate goal is to be able to have a single row in a table that would represent any number of Colors that are associated with a Document to be exported to a completely different system than the one I'm working with.

Any thoughts/comments/suggestions are greatly appreciated.

Thank you in advance for looking at my question.

Ashley Medway
  • 7,151
  • 7
  • 49
  • 71
copaX
  • 63
  • 2
  • 6
  • What you want is called (exact) **[Relational Division](http://en.wikipedia.org/wiki/Relational_algebra#Division_.28.C3.B7.29)**. See this article: **[Divided We Stand: The SQL of Relational Division](https://www.simple-talk.com/sql/t-sql-programming/divided-we-stand-the-sql-of-relational-division/)** – ypercubeᵀᴹ Feb 28 '14 at 23:51
  • 1
    See also this question, with may ways to solve a similar problem: **[How to filter SQL results in a has-many-through relation](http://stackoverflow.com/questions/7364969/how-to-filter-sql-results-in-a-has-many-through-relation)** – ypercubeᵀᴹ Feb 28 '14 at 23:57
  • Do you want this for a specific `document_id` or for all of them? – ypercubeᵀᴹ Feb 28 '14 at 23:59
  • @ypercube I want it for all documents – copaX Mar 02 '14 at 00:08

2 Answers2

3

Do a normal join of the two tables, and count the number of rows in each pairing. Then test whether this is the same as the number of times each of the items appears in the original tables. If all are the same, then all color IDs must match.

SELECT a.color_group_id, a.document_id
FROM (
    SELECT color_group_id, document_id, COUNT(*) ct
    FROM color_document d
    JOIN color_group g ON d.color_id = g.color_id
    GROUP BY color_group_id, document_id) a
JOIN (
    SELECT color_group_id, COUNT(*) ct
    FROM color_group
    GROUP BY color_group_id) b
ON a.color_group_id = b.color_group_id and a.ct = b.ct
JOIN (
    SELECT document_id, COUNT(*) ct
    FROM color_document
    GROUP BY document_id) c
ON a.document_id = c.document_id and a.ct = c.ct

SQLFIDDLE

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • The only thing I would add to this answer is that it would be a good idea to have a unique constraint on both `(color_group_id, color_id)` and `(color_id, document_id)`, or [this could happen](http://sqlfiddle.com/#!2/24fc27/1). – Air Mar 01 '14 at 00:29
  • This looks like it will get me what I need! At least from a couple test scenarios, haven't run it against the full data set yet. Thank you, @Bamar and everyone else who posted replies! – copaX Mar 02 '14 at 00:28
-1

If i understand your question correct you just have to join the two tables and then group the results by color_group_id an document_id.

SQL Fiddle

select color_group_id, document_id
from 
  color_document cd join
  color_group cg 
  on cd.color_id = cg.color_id
group by color_group_id, document_id

That query will give you this result set:

COLOR_GROUP_ID  DOCUMENT_ID
3               4270851
4               4270851
11              4270851
12              4270851

Is that what you want?

Matze
  • 325
  • 1
  • 12
  • 1
    Document 4270851 has two color IDs, but color group ID 3, 4, and 11 only have one of them, so they shouldn't be in the result. – Barmar Mar 01 '14 at 00:16
  • Then just add a HAVING COUNT(*) > 1 under the group by ;) – Matze Mar 01 '14 at 14:13
  • That's not enough. What if one of them has 2 color IDs and the other has 3? The need is to check that the counts are the same, not just more than one. – Barmar Mar 01 '14 at 14:24
  • @Bamar has the right idea. I'm looking to find the color group that has the exact same (unorderd) set of colors ids as a given document – copaX Mar 02 '14 at 00:12