Selecting id's where all rows match

Question

I have two tables : DOCUMENT and METADATA. DOCUMENT stores an ID and some informations we're not interested in, METADATA stores "tags" for those documents. A tag is composed of a key and a value.

So for one document, there is only one entry in the DOCUMENT table, but possibly many in the METADATA table.

Now what I need is to pass a set of keys/values, and retrieve from the METADATA table only the documents that match ALL the keys/values. Which means inspecting different rows "at the same time", well, I don't really know how to do it.

Quick example:

META_KEY | META_VALUE | META_DOCUMENT_ID
----------------------------------------
Firstname| Chris      | 1
Lastname | Doe        | 1
Firstname| Chris      | 2
Lastname | Moe        | 2

So if I query with the following tags : "Firstname"="Chris", "Lastname"="Doe", I want 1 as result. If I only specify "Firstname"="Chris" I want both 1 and 2 as results.

Thanks a lot for any help !

EDIT :

How about something where I count the number of tags that have to match ? Like this :

select meta_document_id, count(*) from metadata where (meta_key = 'Firstname' and meta_value = 'Chris') or (meta_key = 'Lastname' and meta_value = 'Doe') group by meta_document_id

With the count(*) I can easily find out if all the input key/value pairs have matched. How would that run performance-wise ?

why cant use select META_DOCUMENT_ID from table where META_VALUE=lastname — safin chacko, Feb 19 '16 at 13:44
@safinchacko May be because he want to get not all id which have `lastname` but id where `lastname` == smth — Batanichek, Feb 19 '16 at 13:45
This looks bad on a larger scale, I might need to retrieve documents with 10-20 METADATA keys/values... — Christophe Schutz, Feb 19 '16 at 13:46
What do you mean by "if I query with..."? How would you pass the tags to the query? It seems you need to build a dynamic query based on your tags using something other than pure SQL. — bumpy, Feb 19 '16 at 13:48
I'll use hibernate with Java. But indeed I don't even have a clue about how to pass the arguments. I guess a loop is inevitable but I'd need the query to be solid with a lot of arguments and/or much data in the table. — Christophe Schutz, Feb 19 '16 at 13:50
This is perhaps THE classic relational database anti-pattern. It's the old "Create a 'database in a database'" thing. Don't store data this way. Yes, it's possible. No, you don't want to. Best of luck. — Bob Jarvis - Слава Україні, Feb 19 '16 at 14:10
I wish I could avoid it, I really do. Do you have an alternative considering I have no clue about what the tags may be and how they will evolve ? — Christophe Schutz, Feb 19 '16 at 14:12
It's also called an "entity-attribute-value" model. Take a look at [the Wikipedia entry](https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model). You might also want to look at [Bill Karwin's book on SQL anitpatterns](https://pragprog.com/book/bksqla/sql-antipatterns). I think you're probably looking at using something other than a relational database for your data. Best of luck. — Bob Jarvis - Слава Україні, Feb 19 '16 at 14:19

score 3 · Answer 1 · edited May 23 '17 at 12:07

Well, you are employing a database model named "key-value" or "Entity-attributte-value".

This is usually not a best choice, you can read more on this in these questions:

You need two separate queries for these two cases like this:

SELECT distinct META_DOCUMENT_ID
FROM METADATA 
WHERE meta_key = 'Firstname' and meta_value = 'Chris'

SELECT distinct m1.META_DOCUMENT_ID
FROM METADATA m1
JOIN METADATA m2
ON m1.META_DOCUMENT_ID = m2.META_DOCUMENT_ID
WHERE m1.meta_key = 'Firstname' and m1.meta_value = 'Chris'
  AND m2.meta_key = 'Lastname' and m2.meta_value = 'Doe'

EDIT:

I suppose I'll have to join N times the table for N key/value pairs ?

This could be done without a join, for example like below (assuming that each id has no more than 1 meta_key value):

SELECT META_DOCUMENT_ID
FROM METADATA 
WHERE (meta_key, meta_value) IN
   ( ('Firstname' ,'Chris'), ('Lastname', 'Doe' ) )
GROUP BY META_DOCUMENT_ID
HAVING COUNT(*) = 2 /* 2 means that we are looking for 2 meta keys */

How is that going to run performance-wise ?

Terribly. See an explanation from links above about this model.

This query must in many cases do a full table scan (especially when a number of attributes/keys we are looking for is more than a few), count values for each id, then pick these id that have count = 2.

In a normalized model this is a simple query that can use indexes to quickly pick only these few rows with firstname = 'Chris'

SELECT *
FROM table 
WHERE firstname = 'Chris' and lastname = 'Doe'

Thanks for the good links. That being said I don't really think any of the two gave a decent alternative besides exposing the weaknesses of that design. I might work on it differently if we figure out a "pattern" of keys though. As for the query, I supposed i'll have to join N times the table for N key/value pairs ? How is that going to run performance-wise ? — Christophe Schutz, Feb 19 '16 at 13:54
I've updated the answer with an example how it could be done without a join. — krokodilko, Feb 19 '16 at 14:06

MT0 · Answer 2 · 2016-02-19T14:59:42.907

Oracle Setup:

CREATE TYPE KEY_VALUE_PAIR IS OBJECT (
  KEY   VARCHAR2(50),
  VALUE VARCHAR2(50)
);
/

CREATE TYPE KEY_VALUE_TABLE IS TABLE OF KEY_VALUE_PAIR;
/

CREATE TABLE meta_data ( meta_key, meta_value, meta_document_id ) AS
SELECT 'Firstname',   'Chris',    1 FROM DUAL UNION ALL
SELECT 'Lastname',    'Doe',      1 FROM DUAL UNION ALL
SELECT 'Phonenumber', '555-2368', 1 FROM DUAL UNION ALL
SELECT 'Firstname',   'Chris',    2 FROM DUAL UNION ALL
SELECT 'Lastname',    'Moe',      2 FROM DUAL UNION ALL
SELECT 'Phonenumber', '555-0001', 2 FROM DUAL;

Query:

SELECT meta_document_id
FROM   (
  SELECT meta_document_id,
         CAST(
           COLLECT(
             KEY_VALUE_PAIR( meta_key, meta_value )
           ) AS KEY_VALUE_TABLE
         ) AS key_values
  FROM   meta_data
  GROUP BY meta_document_id
)
WHERE  KEY_VALUE_TABLE(
         -- Your values here:
         KEY_VALUE_PAIR( 'Firstname', 'Chris' ),
         KEY_VALUE_PAIR( 'Lastname',  'Doe' )
       )
       SUBMULTISET OF key_values;

Output:

 META_DOCUMENT_ID
------------------
                1

Update - Reimplementing the meta data table using a nested table:

Oracle Setup:

CREATE TYPE KEY_VALUE_PAIR IS OBJECT (
  META_KEY   VARCHAR2(50),
  META_VALUE VARCHAR2(50)
);
/

CREATE TYPE KEY_VALUE_TABLE IS TABLE OF KEY_VALUE_PAIR;
/

CREATE TABLE meta_data (
  meta_document_id INT,
  key_values       KEY_VALUE_TABLE
) NESTED TABLE key_values STORE AS meta_data_key_values;

CREATE UNIQUE INDEX META_DATA_KEY_VALUES_IDX ON META_DATA_KEY_VALUES (
  NESTED_TABLE_ID,
  META_KEY,
  META_VALUE
);
/

-- Insert everything in one go:
INSERT INTO META_DATA VALUES(
  1,
  KEY_VALUE_TABLE(
    KEY_VALUE_PAIR( 'Firstname',   'Chris' ),
    KEY_VALUE_PAIR( 'Lastname',    'Doe' ),
    KEY_VALUE_PAIR( 'Phonenumber', '555-2368' )
  )
);

-- Insert everything in bits:
INSERT INTO meta_data VALUE ( 2, KEY_VALUE_TABLE() );

INSERT INTO TABLE( SELECT key_values FROM meta_data WHERE meta_document_id = 2 )
  ( meta_key, meta_value ) VALUES( 'Firstname', 'Chris' );
INSERT INTO TABLE( SELECT key_values FROM meta_data WHERE meta_document_id = 2 )
  ( meta_key, meta_value ) VALUES( 'Lastname', 'Moe' );
INSERT INTO TABLE( SELECT key_values FROM meta_data WHERE meta_document_id = 2 )
  ( meta_key, meta_value ) VALUES( 'Phonenumber', '555-0001' );

--Select all the key-value pairs:
SELECT META_DOCUMENT_ID,
       META_KEY,
       META_VALUE
FROM   META_DATA md,
       TABLE( md.KEY_VALUES );

Query:

The changes above let you simplify the query a lot:

SELECT META_DOCUMENT_ID
FROM   meta_data
WHERE  KEY_VALUE_TABLE(
         -- Your values here:
         KEY_VALUE_PAIR( 'Firstname', 'Chris' ),
         KEY_VALUE_PAIR( 'Lastname',  'Doe' )
       )
       SUBMULTISET OF key_values;

Wow, that's one hell of a solution. I don't even understand half of what you wrote ! :D That being said, do you think this is going to be more efficient performance-wise that having the multiple join clause query posted by kordirko ? — Christophe Schutz, Feb 19 '16 at 14:10
The honest answer is: "Profile both solutions on your data set and see". However, this solution does only require a single table scan and the explain plan only shows one `sort (group by)` and one `filter` so it ought to be reasonable. — MT0, Feb 19 '16 at 14:14
@ChristopheSchutz updated with a slightly modified table structure that ought to improve performance on the `SELECT` at the cost of slightly more complicated `INSERT`s (but not significantly less performant) - however, for the use-cases I'd consider for this, you are likely to be selecting much much more than inserting. — MT0, Feb 19 '16 at 15:05
@ChristopheSchutz I don't use hibernate but you ought to be able to set it up to [pass an array](http://stackoverflow.com/questions/4712566/hibernate-and-oracle-varrays-nested-table) of a [user defined type](http://dinukaroshan.blogspot.co.uk/2009/08/hibernate-and-oracle-user-defined-types.html) directly to Oracle so you don't need to use any dynamic SQL with this solution. — MT0, Feb 19 '16 at 15:26

score 0 · Answer 3 · answered Feb 19 '16 at 14:03

0

If you know in advance all the possible TAGS, an approach could be with some PIVOT:

with METADATA (META_KEY, META_VALUE, META_DOCUMENT_ID) as
(
select 'Firstname', 'Chris',1 from dual union all
select 'Lastname', 'Doe',1 from dual union all
select 'Firstname', 'Chris',2 from dual union all
select 'Lastname', 'Moe',2 from dual
)
select *
from metadata
PIVOT  ( max (META_VALUE ) FOR (META_KEY) IN ('Firstname' AS Firstname, 'Lastname' AS Lastname))
where Firstname = 'Chris' /* and Lastname ='Doe' ...*/

answered Feb 19 '16 at 14:03

Aleksej

22,443
5
33
38

I don't... else I wouldn't even have this kind of model in the first place ^^ – Christophe Schutz Feb 19 '16 at 14:11
1

@ChristopheSchutz Presumably you'll which tags you're querying for at runtime, though? If so, you ought to be able to construct the pivot as per your requirements dynamically. – Boneist Feb 19 '16 at 17:19

Selecting id's where all rows match

3 Answers3