1

I have the following DB tables that I am trying to query:

t_shared_users
user_id
user_category
folder_id
expiry

t_documents
id
folder_id
user_id
user_category
description
created
updated

t_folder
id
type
user_id
user_category
created
updated

I would like to find all the documents you own and have shared access to. ie. search for all documents in t_documents where user_id = 1 AND user_category = 100 but also include those documents in the folder you have access to in t_shared_users. Here is my attempt at the query:

  SELECT
    id,
    folder_id,
    user_id,
    user_category,
    description,
    created,
    updated
  FROM
    t_documents
  WHERE
    user_category = 100
    AND user_id = 1

  UNION ALL

  SELECT
    d.id,
    d.folder_id,
    d.user_id,
    d.user_category,
    d.description,
    d.created,
    d.updated
  FROM
    t_documents d
  JOIN
    t_shared_users s
  ON
    d.folder_id = s.folder_id
  WHERE
    d.user_category = 100
    d.AND user_id = 1

 ORDER BY
   created ASC
 LIMIT
   10

Is there any better/more performant/concise way to write this query? The above seems a little verbose and slow.

edit:

CREATE TABLE t_folder (
  id            SERIAL                   NOT NULL,
  user_category SMALLINT                 NOT NULL,
  user_id       INTEGER                  NOT NULL,
  type          INTEGER                  NOT NULL,
  description   TEXT,
  created       TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
  updated       TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
  PRIMARY KEY (id)
);

CREATE TABLE t_documents (
  id            BIGSERIAL                NOT NULL,
  folder_id     INTEGER,
  user_category SMALLINT                 NOT NULL,
  user_id       INTEGER                  NOT NULL,
  description   TEXT                     NOT NULL,
  created       TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
  updated       TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
  PRIMARY KEY (id)
);

CREATE TABLE t_shared_users (
  id            SERIAL,
  folder_id     INTEGER                  NOT NULL,
  user_category INTEGER                  NOT NULL,
  user_id       INTEGER                  NOT NULL,
  expiry        TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT now(),
  PRIMARY KEY (id)
);
John Baum
  • 3,183
  • 11
  • 42
  • 90

2 Answers2

1

This is your query:

  SELECT
    id,
    folder_id,
    user_id,
    user_category,
    description,
    created,
    updated
  FROM
    t_documents
  WHERE
    user_category = 100
    AND user_id = 1

  UNION ALL

  SELECT
    d.id,
    d.folder_id,
    d.user_id,
    d.user_category,
    d.description,
    d.created,
    d.updated
  FROM
    t_documents d
  JOIN
    t_shared_users s
  ON
    d.folder_id = s.folder_id
  WHERE
    d.user_category = 100
    AND d.user_id = 1 -- your query actually has a typo here

What I don't understand about the above query is why you are filtering on d.user_category and d.user_id (t_documents table) in the bottom part of the query. Are you sure you didn't mean s.user_category and s.user_id (t_shared_users)? If not, what is the point of joining with t_shared_users?

Assuming that I am correct that your query is in error, this is how I would rewrite it:

select d.*
  from t_documents d
 where d.user_category = 100
   and d.user_id = 1
 union
select d.*
  from t_shared_users s
  join t_documents d
    on d.folder_id = s.folder_id
 where s.user_category = 100
   and s.user_id = 1

Notice that I use union instead of union all, as I believe it's technically possible to get possibly unwanted duplicate documents otherwise.

Also, just as a rough approximation, these are the indexes I would define for good performance:

  • t_documents (user_id, user_category)
  • t_documents (folder_id)
  • t_shared_users (user_id, user_category, folder_id)
sstan
  • 35,425
  • 6
  • 48
  • 66
  • Do you think its better for t_shared_users to have those indexes split into two? ie. one for just folder_id and the other one a composite on user_id, user_category? That is what I have currently. – John Baum Oct 16 '16 at 03:48
  • Also, when would you think a duplicate could occur with UNION ALL? The performance difference between using UNION vs UNION ALL i non trivial. I have this constraint on t_shared_users btw : CONSTRAINT constraint_user_per_doc UNIQUE (user_category, user_id, folder_id) – John Baum Oct 16 '16 at 03:49
  • 1
    About your index question: I think that your current split indexes should provide acceptable performance. But I do think that adding `folder_id` to the `user_id, user_category` index should allow for a better execution plan. You'd have to try it out to see how much of a difference it actually makes. About the potential for duplicates: I don't know your business rules, but if it's possible for a document to be both owned by user `1`, and also at the same time to be part of a folder that is shared to user `1`, then a `union all` would return that document twice. `union` would fix that. – sstan Oct 16 '16 at 04:02
-1

Starting from the query, you have given, I would replace join with left join

select
    d.id,
    d.folder_id,
    d.user_id,
    d.user_category,
    d.description,
    d.created,
    d.updated
from t_documents d
left join t_shared_users s on d.folder_id = s.folder_id
where (d.user_category = 100 and d.user_id = 1)
      or (s.user_category = 100 and s.user_id = 1)

This would give you all entries from t_documents with user_id = 1 and user_category = 100, and also all entries with the same where clause, where you have access to shared documents.

Community
  • 1
  • 1
Olaf Dietsche
  • 72,253
  • 8
  • 102
  • 198
  • I get an error when i dont specify the table in the where clause "ERROR: column reference "user_category" is ambiguous." If i were to pick d.user_type or s.user_type, the query would be wrong – John Baum Oct 15 '16 at 19:54
  • Then the query you provided must give the same error. – Olaf Dietsche Oct 15 '16 at 20:36
  • Yes, I just noticed the typo. Given the query above now, is there a way to adapt your answer to simplify it? – John Baum Oct 15 '16 at 20:49
  • Prefixing the columns with `d.` should work then, see updated answer. – Olaf Dietsche Oct 15 '16 at 21:08
  • It does not. It leaves out those documents whose owner is in the shared table because you are filtering the where by d.user_id – John Baum Oct 15 '16 at 21:10
  • What I would need is something like: where d.user_category = 100 and d.user_id = 1 OR where s.user_category = 100 and s.user_id = 1, but that is incredibly slow – John Baum Oct 15 '16 at 21:11
  • When `(d.user_category = 100 and d.user_id = 1) or (s.user_category = 100 and s.user_id = 1)` is necessary to get correct results, you must use it. – Olaf Dietsche Oct 15 '16 at 21:25