-1

I have a table named content

content has content_id, android_id and user_id

  content_id | android_id | user_id    
  a1         |   b1       |    c1
  a1         |   b2       |    c2
  a1         |   b3       |    c1
  a1         |   b4       |    c3
  a1         |   b1       |    c4
  a2         |   b2       |    c1

I want to select count of distinct android_id and user_id for a specific content_id

(if a specific android_id has appeared in any of rows, it should not appear in any other row even if its user_id is distinct; similarly for user_id i.e. no two user_id in any of selected rows should match)

i.e. for content_id=a1, following rows should be selected

a1          |   b1        |    c1
a1          |   b2        |    c2
a1          |   b4        |    c3

i.e. b1,c1,b2,c2,b4,c3 is mutually distinct and didn't appear in any of multiple rows

I want MYSQL Query. Thanks

Dharman
  • 30,962
  • 25
  • 85
  • 135
user609306
  • 1,333
  • 6
  • 17
  • 27
  • 1
    What have you tried so far? Show us the query you've tried. – Styphon May 25 '17 at 08:21
  • 1
    I don't understand your requirements. – Tim Biegeleisen May 25 '17 at 08:24
  • I don't understand either. Why isn't `a1 | b3 | c1` selected, for example? – Chris Lear May 25 '17 at 08:25
  • By combining letters and numbers, you've made this problem tediously complicated. – Strawberry May 25 '17 at 08:26
  • @ChrisLear a1 | b3 | c1 is not selected because c1 has already occurred in a1 | b1 | c1 – user609306 May 25 '17 at 08:28
  • no two rows could contain android_id and user_id which has appeared in any of rows before – user609306 May 25 '17 at 08:29
  • i suspect (hope) that your data set is wrong. And that you instead meant 'a2' on the 5th line – Strawberry May 25 '17 at 08:30
  • no a1, a2 are separate, we are doing this analysis for only a1 i.e. content_id=a1 – user609306 May 25 '17 at 08:31
  • How does it know to select `b1 c1` and not `b3 c1`? – Barmar May 25 '17 at 08:33
  • @Barmar it selected b1 c1 because this row is above b3 c1 in database. Now since c1 has appeared in b1,c1, it can't appear again in b3 c1 – user609306 May 25 '17 at 08:39
  • Is this a correct formulation? For a given content_id you want to select all rows that do not contain either an android_id or a user_id that has been previously selected. The sort order is android_id, user_id ascending. – Chris Lear May 25 '17 at 08:41
  • Every combination is unique, right? – Strawberry May 25 '17 at 08:42
  • @user609306 There's no ordering to rows in a MySQL table. Do you have another column that specifies the order of the rows, like an auto-incremented ID? – Barmar May 25 '17 at 08:42
  • This seems like it will be very complicated to do in SQL. Can you use an application programming language to process the result and remove duplicates? – Barmar May 25 '17 at 08:43
  • @GordonLinoff I think we need someone with your expertise to figure this one out. – Barmar May 25 '17 at 08:45
  • Is b1,c1 mutually distinct? I cannot see how. – Strawberry May 25 '17 at 08:47
  • @ChrisLear Yes, You are exactly correct. Though sort order could be anything. – user609306 May 25 '17 at 08:49
  • If sort order could be anything, the output is not determined – Chris Lear May 25 '17 at 08:50
  • @Barmar i have tried doing so in PHP, what i am doing is i store selected values from row in an array. Now for every new rows, i check if value in present in array or not. If value is present i would not select it, otherwise i would select row and insert column values in array. It happens in while loop of row select PHP MYSQL. – user609306 May 25 '17 at 08:52
  • @ChrisLear No worries, Please read my question ----count of distinct android_id and user_id for a specific content_id--- so i want count of such rows, i don't want rows. i have mentioned example to explain my question – user609306 May 25 '17 at 08:53
  • @Barmar i thought if i could clean my code, if it could be done in one SQL statement. But i couldn't do – user609306 May 25 '17 at 08:54
  • @user609306 This kind of thing is not hard to do in SQL for a single column, but when you want to exclude multiple columns in any combination I think it will be difficult. – Barmar May 25 '17 at 08:54

3 Answers3

1

This query gets the result expected, based on the data given:

select c1.* from
    (select * from content where content_id = 'a1' group by android_id) c1
    join
    (select * from content where content_id = 'a1' group by user_id) c2
    on
    c1.android_id = c2.android_id and c1.user_id = c2.user_id;

It relies on a mysql 'feature' explained in this question: MySQL Select rows on first occurrence of each unique value, whereby you can return the first distinct value using a group without an aggregator.

Chris Lear
  • 6,592
  • 1
  • 18
  • 26
  • (Whatever the reality) the manual itself claims that this 'feature' no longer exists. – Strawberry May 25 '17 at 08:59
  • It still works, and given that the sorting isn't even significant, it seems safe. But @Strawberry has given what is probably a better answer, so it's probably best to use that. – Chris Lear May 25 '17 at 09:03
1

Assuming you have an id column that orders the rows, this produces the desired result.

SELECT COUNT(*)
FROM content AS c1
LEFT JOIN content AS c2
ON c1.content_id = c2.content_id AND c1.id > c2.id
AND (c1.android_id = c2.android_id OR c1.user_id = c2.user_id)
WHERE c1.content_id = 'a1' AND c2.id IS NULL

It's based on the usual LEFT JOIN/NULL pattern to find rows that don't match in another table. In this case, the other table is the same table, and c1.id > c2.id makes it return the first row in the table in the mutually exclusive set. The joining condition then test both android_id and user_id; if either of them match, they're exluded from the result.

DEMO

Barmar
  • 741,623
  • 53
  • 500
  • 612
0

With thanks to Chris Lear for clearing up the logic...

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(content_id INT NOT NULL
,android_id INT NOT NULL
,user_id INT NOT NULL
,PRIMARY KEY(content_id,android_id,user_id)
);

INSERT INTO my_table VALUES
(1, 1, 1),
(1, 2, 2),
(1, 3, 1),
(1, 4, 3),
(1, 1, 4),
(2, 2, 1);

SELECT x.* 
  FROM my_table x 
  LEFT 
  JOIN my_table y  
    ON y.content_id = x.content_id 
   AND 
     (
       ( y.user_id = x.user_id AND y.android_id < x.android_id ) 
    OR ( y.user_id < x.user_id AND y.android_id = x.android_id )
     ) 
 WHERE y.content_id IS NULL;
+------------+------------+---------+
| content_id | android_id | user_id |
+------------+------------+---------+
|          1 |          1 |       1 |
|          1 |          2 |       2 |
|          1 |          4 |       3 |
|          2 |          2 |       1 |
+------------+------------+---------+

If you insist on using alphanumeric keys, then I would suggest converting the above to a composite UNIQUE key, and adding a surrogate PK instead. You would then do the '<' comparison on the PK.

Obviously?, the 'count' is simply the number of rows returned, but if you want a count for each content_id, just change the SELECT to SELECT content_id, COUNT(*) total and add GROUP BY content_id at the end.

Strawberry
  • 33,750
  • 13
  • 40
  • 57