Select rows so that two of the columns are separately unique

Question

Table user_book describes every user's favorite books.

CREATE TABLE user_book (
  user_id INT,
  book_id INT,
  FOREIGN KEY (user_id) REFERENCES user(id),
  FOREIGN KEY (book_id) REFERENCES book(id)
);

insert into user_book (user_id, book_id) values
(1, 1),
(1, 2),
(1, 5),
(2, 2),
(2, 5),
(3, 2),
(3, 5);

I want to write a query (possibly a with clause that defines multiple statements ― but not a procedure) that would try to distribute ONE favorite book to every user who has one or more favorite books.

Any ideas how to do it?

More details:

The distribution plan may be naive. i.e. it may look as if you went user after user and each time randomly gave the user whatever favorite book was still available if there was any, without considering what would be left for the remaining users.

This means that sometimes some books may not be distributed, and/or sometimes some users may not get any book (example 2). This can happen when the numbers of books and users are not equal, and/or due to the specific distribution order that you have used.

A book cannot be distributed to two different users (example 3).

Examples:

1. A possible distribution:

(1, 1)
(2, 2)
(3, 5)

2. A possible distribution (here user 3 got nothing, and book 1 was not distributed. That's acceptable):

(1, 2)
(2, 5)

3. An impossible distribution (both users 1 and 2 got book 2, that's not allowed):

(1, 2)
(2, 2)
(3, 5)

Similar questions that are not exactly this one:

How to select records without duplicate on just one field in SQL?

SQL: How do I SELECT only the rows with a unique value on certain column?

How to select unique records by SQL

score 0 · Answer 1 · answered Aug 26 '22 at 10:56

0

The user_book table should also have a UNIQUE(user_id, book_id) constraint.

A simple solution like this returns a list in which each user gets zero or one book and each book is given to zero or one user:

WITH list AS (SELECT user_id, MIN(book_id) AS fav_book FROM user_book GROUP BY user_id)
SELECT fav_book, MIN(user_id) FROM list GROUP BY fav_book

answered Aug 26 '22 at 10:56

fredt

24,044
3
40
61

Thank you for writing. You suggestion may distribute only book #1 if this book is every user's favorite. It's less than a naive distribution because you still have undistributed books and users without a book who have those books among their favorites. It looks like this question is not a trivial one... – rapt Aug 28 '22 at 19:07
The solution that you want has also this condition: the maximum number of books are distributed. This is a non-trivial discreet optimization problem. – fredt Aug 30 '22 at 10:29
Actually because I may be limited in what can be done by one query (even with a `with` clause), i.e. without a procedure, I specified that in this question, an acceptable solution may be less strong than a maximum number of books being distributed. I wrote "The distribution plan may be naive", and the meaning is that if a distribution is such that we cannot distribute more books without taking a book away from a user who already got one, then this this distribution is an acceptable solution. Although it's possibly not a maximal distribution. – rapt Sep 05 '22 at 07:47
I have asked in another forum about the different question of finding a maximal distribution, and there is a polynomial time algorithm to do that. But it seems more realistic to implement it by a procedural language. – rapt Sep 05 '22 at 07:51
A possible algorithm is to recursively: select the users with the minimum number of favourite books, and allocate the maximum possible number of books among these users, then eliminate all these users and eliminate the allocated books from all other users. – fredt Sep 06 '22 at 10:52
If by "the minimum number" you mean the minimum `book_id`, then multiple users may have this as one of their favorites. If the book is still available, we can give this book only once. I don't see how to allocate more than one book per iteration... I wrote something that accomplishes the distribution by using the working table, but it seems like it may be shorter and more efficient to use the RECURSIVE_TABLE. Can you show here or provide a link to an example of how to use RECURSIVE_TABLE in a query? I could not see any example in the documentation. – rapt Sep 09 '22 at 19:59
The "minimum number of favourite books" refers to how many favourites a user has, not the book_id. In your example, users 2 and 3 each have 2 books, which is the minimum to start with. Re the RECURSIVE_TABLE, you can refer to it in EXISTS or IN predicates used in the WHERE clause of the second part of the recursive query's UNION. – fredt Sep 10 '22 at 21:28

Select rows so that two of the columns are separately unique

1 Answers1