1

I want to join four tables with null values and without duplicates and afterwards translate it into an SQLAlchemy query.

The tables are (simplified):

Category(id, name)
Task(id, category.id, name)
User(id, name)

And an many-to-many table:

Solved(task.id, user.id)

I want to get all tasks with their category and a column with an specific user who solved the task:

+---------------+-----------+-----------+
| category.name | task.name | user.name |
+---------------+-----------+-----------+
| abc           | abctask1  | <null>    |
| abc           | abctask2  | luke      |
| def           | deftask1  | <null>    |
| ghi           | ghitask1  | <null>    |
| ghi           | ghitask2  | luke      |
+---------------+-----------+-----------+

At the moment I've got 3 to 4 individual SQLAlchemy-queries to perform that task. If possible, it should be merged into only one query to avoid too many reads on the database.

So far I've got:

SELECT DISTINCT
  cat.name, t.name, u.name
FROM
  Task t
JOIN 
  Category cat ON cat.id = t.category_id
LEFT JOIN 
  Solved s ON s.task_id = t.id
LEFT JOIN 
  User u ON s.user_id = u.id AND
  u.name = 'luke'
ORDER BY
  cat.name

But, although DISTINCT, I got duplicates from all rows with the given user:

+---------------+-----------+-----------+
| category.name | task.name | user.name |
+---------------+-----------+-----------+
| abc           | abctask1  | <null>    |
| abc           | abctask2  | luke      |
| abc           | abctask2  | <null>    | <-- duplicate
| def           | deftask1  | <null>    |
| ghi           | ghitask1  | <null>    |
| ghi           | ghitask2  | luke      |
| ghi           | ghitask2  | <null>    | <-- duplicate
+---------------+-----------+-----------+

Is there a possibility to get this table with one query and translate it to SQLAlchemy?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Topiks
  • 80
  • 1
  • 10
  • For the record: DISTINCT will only coalesce rows which are *exactly the same*. It will not remove "almost-duplicates" like those shown in the OP. – Kevin Jul 17 '15 at 21:46
  • There are no duplicates in the displayed result. Seems like you actually want distinct *tasks* in the result (exactly one row per task) - with the given user only if s(he) solved it? BTW, `User` is not a valid table name without double quotes (reserved word). – Erwin Brandstetter Jul 17 '15 at 23:52

2 Answers2

1

The problem comes from your data, ie you probably have 2 task called abctask2/ ghitask2. Maybe you should put a constraint on the taskname. You query works well.

http://sqlfiddle.com/#!9/c4647c/4

Try to check with

SELECT category_id, name ,count(*) from TASK GROUP BY category_id, name HAVING COUNT(*)<>1
Aurelien Ecoto
  • 206
  • 2
  • 5
1

You have two LEFT JOINS:

  • The 1st left join can join to multiple rows from solved. Say, 'jane' and 'luke' solved the task.
  • The 2nd left join can only join to users named 'luke' ('luke' in the join condition!).

You still get both rows, 'jane' is just not shown, the join condition filters her out, but the LEFT JOIN preserves the row in the result anyway and appends NULL values.

You can achieve what you are after by using parentheses and an [INNER] JOIN instead of the LEFT JOIN between solved and users. The manual:

Use parentheses if necessary to determine the order of nesting. In the absence of parentheses, JOINs nest left-to-right.

SELECT c.name AS cat_name, t.name AS task_name, u.name AS user_name
FROM   task t
JOIN   category c ON cat.id = t.category_id
LEFT   JOIN
      (solved s JOIN users u ON u.id = s.user_id AND u.name = 'luke') ON s.task_id = t.id
ORDER  BY 1, 2, 3;
  • Using table name users instead of the reserved word user.

  • Assuming that users.name is defined unique or you can have multiple users named 'luke'.

  • If (task.id, users.id) in solved is defined UNIQUE or PRIMARY KEY, you don't need DISTINCT at all.

The resulting query is not only correct, but also faster.


SqlAlchemy version of the above query: (contributed by @van)
This assumes that Category, Task and User are mapped classes, while solved is instance of Table (just an association table as shown in code example Many to Many):

user_name = 'luke'
q = (session.query(Category.name, Task.name, User.name)
     .select_from(Task)
     .join(Category)
     .outerjoin(
         join(solved, User,
              (solved.c.user_id == User.id) & (User.name == user_name),
         ))
     .order_by(Category.name, Task.name, User.name)
     )
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228