MYSQL mutual distinct columns in multiple rows

Question

I have a table named content

content has content_id, android_id and user_id

  content_id | android_id | user_id    
  a1         |   b1       |    c1
  a1         |   b2       |    c2
  a1         |   b3       |    c1
  a1         |   b4       |    c3
  a1         |   b1       |    c4
  a2         |   b2       |    c1

I want to select count of distinct android_id and user_id for a specific content_id

(if a specific android_id has appeared in any of rows, it should not appear in any other row even if its user_id is distinct; similarly for user_id i.e. no two user_id in any of selected rows should match)

i.e. for content_id=a1, following rows should be selected

a1          |   b1        |    c1
a1          |   b2        |    c2
a1          |   b4        |    c3

i.e. b1,c1,b2,c2,b4,c3 is mutually distinct and didn't appear in any of multiple rows

I want MYSQL Query. Thanks

I don't understand either. Why isn't `a1 | b3 | c1` selected, for example? — Chris Lear, May 25 '17 at 08:25
By combining letters and numbers, you've made this problem tediously complicated. — Strawberry, May 25 '17 at 08:26
@ChrisLear a1 | b3 | c1 is not selected because c1 has already occurred in a1 | b1 | c1 — user609306, May 25 '17 at 08:28
no two rows could contain android_id and user_id which has appeared in any of rows before — user609306, May 25 '17 at 08:29
i suspect (hope) that your data set is wrong. And that you instead meant 'a2' on the 5th line — Strawberry, May 25 '17 at 08:30
no a1, a2 are separate, we are doing this analysis for only a1 i.e. content_id=a1 — user609306, May 25 '17 at 08:31
@Barmar it selected b1 c1 because this row is above b3 c1 in database. Now since c1 has appeared in b1,c1, it can't appear again in b3 c1 — user609306, May 25 '17 at 08:39
Is this a correct formulation? For a given content_id you want to select all rows that do not contain either an android_id or a user_id that has been previously selected. The sort order is android_id, user_id ascending. — Chris Lear, May 25 '17 at 08:41
@user609306 There's no ordering to rows in a MySQL table. Do you have another column that specifies the order of the rows, like an auto-incremented ID? — Barmar, May 25 '17 at 08:42
This seems like it will be very complicated to do in SQL. Can you use an application programming language to process the result and remove duplicates? — Barmar, May 25 '17 at 08:43
@GordonLinoff I think we need someone with your expertise to figure this one out. — Barmar, May 25 '17 at 08:45
@ChrisLear Yes, You are exactly correct. Though sort order could be anything. — user609306, May 25 '17 at 08:49
If sort order could be anything, the output is not determined — Chris Lear, May 25 '17 at 08:50
@Barmar i have tried doing so in PHP, what i am doing is i store selected values from row in an array. Now for every new rows, i check if value in present in array or not. If value is present i would not select it, otherwise i would select row and insert column values in array. It happens in while loop of row select PHP MYSQL. — user609306, May 25 '17 at 08:52
@ChrisLear No worries, Please read my question ----count of distinct android_id and user_id for a specific content_id--- so i want count of such rows, i don't want rows. i have mentioned example to explain my question — user609306, May 25 '17 at 08:53
@Barmar i thought if i could clean my code, if it could be done in one SQL statement. But i couldn't do — user609306, May 25 '17 at 08:54
@user609306 This kind of thing is not hard to do in SQL for a single column, but when you want to exclude multiple columns in any combination I think it will be difficult. — Barmar, May 25 '17 at 08:54

score 1 · Answer 1 · answered May 25 '17 at 08:56

1

This query gets the result expected, based on the data given:

select c1.* from
    (select * from content where content_id = 'a1' group by android_id) c1
    join
    (select * from content where content_id = 'a1' group by user_id) c2
    on
    c1.android_id = c2.android_id and c1.user_id = c2.user_id;

It relies on a mysql 'feature' explained in this question: MySQL Select rows on first occurrence of each unique value, whereby you can return the first distinct value using a group without an aggregator.

answered May 25 '17 at 08:56

Chris Lear

6,592
1
18
26

(Whatever the reality) the manual itself claims that this 'feature' no longer exists. – Strawberry May 25 '17 at 08:59
It still works, and given that the sorting isn't even significant, it seems safe. But @Strawberry has given what is probably a better answer, so it's probably best to use that. – Chris Lear May 25 '17 at 09:03

score 1 · Accepted Answer · answered May 25 '17 at 09:04

1

Assuming you have an id column that orders the rows, this produces the desired result.

SELECT COUNT(*)
FROM content AS c1
LEFT JOIN content AS c2
ON c1.content_id = c2.content_id AND c1.id > c2.id
AND (c1.android_id = c2.android_id OR c1.user_id = c2.user_id)
WHERE c1.content_id = 'a1' AND c2.id IS NULL

It's based on the usual LEFT JOIN/NULL pattern to find rows that don't match in another table. In this case, the other table is the same table, and c1.id > c2.id makes it return the first row in the table in the mutually exclusive set. The joining condition then test both android_id and user_id; if either of them match, they're exluded from the result.

DEMO

answered May 25 '17 at 09:04

Barmar

741,623
53
500
612

I like the sqlfiddle UI, but I have to admit that rextester is much faster – Strawberry May 25 '17 at 09:07
@Strawberry True, sqlfiddle is an old habit. – Barmar May 25 '17 at 09:09
The drawback with rextester of course is that you have to find tablenames that haven't had FK constraints placed upon them :-( – Strawberry May 25 '17 at 09:10

Strawberry · Answer 3 · 2017-05-25T09:05:23.450

With thanks to Chris Lear for clearing up the logic...

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(content_id INT NOT NULL
,android_id INT NOT NULL
,user_id INT NOT NULL
,PRIMARY KEY(content_id,android_id,user_id)
);

INSERT INTO my_table VALUES
(1, 1, 1),
(1, 2, 2),
(1, 3, 1),
(1, 4, 3),
(1, 1, 4),
(2, 2, 1);

SELECT x.* 
  FROM my_table x 
  LEFT 
  JOIN my_table y  
    ON y.content_id = x.content_id 
   AND 
     (
       ( y.user_id = x.user_id AND y.android_id < x.android_id ) 
    OR ( y.user_id < x.user_id AND y.android_id = x.android_id )
     ) 
 WHERE y.content_id IS NULL;
+------------+------------+---------+
| content_id | android_id | user_id |
+------------+------------+---------+
|          1 |          1 |       1 |
|          1 |          2 |       2 |
|          1 |          4 |       3 |
|          2 |          2 |       1 |
+------------+------------+---------+

If you insist on using alphanumeric keys, then I would suggest converting the above to a composite UNIQUE key, and adding a surrogate PK instead. You would then do the '<' comparison on the PK.

Obviously?, the 'count' is simply the number of rows returned, but if you want a count for each content_id, just change the SELECT to SELECT content_id, COUNT(*) total and add GROUP BY content_id at the end.

MYSQL mutual distinct columns in multiple rows

3 Answers3