1

I have a table with data like following, want to return those group_id with unique data. Both group_id 3 and 4 have two component 123 and 456, so they are "duplicated", we just need to return the smaller group_id, that's 3. Also group_id 5 doesn't have a duplication, it can be returned. So we want group_id 3 and 5 to be returned.

How can I write a SQL query against postgres database to achieve that? Thank you!

id group_id component_id
1 3 123
2 3 456
3 4 123
4 4 456
5 5 123
forpas
  • 160,666
  • 10
  • 38
  • 76

3 Answers3

0
SELECT group_id, MIN(component_id)
FROM   MyTable
GROUP  BY group_id
HAVING COUNT(*) > 1
SQLpro
  • 3,994
  • 1
  • 6
  • 14
  • Thanks for your suggestion. There are 3 groups in above table, group_id 3 with component 123, 456, group_id 4 with component 123, 456, group_id 5 with component 123 we treat group_id 3 and 4 are duplicated because they all have the same components. We need to return 3 and 5. I think your query returns 3, 123; 4, 123. – zhao xiongwei Dec 21 '21 at 14:24
  • Just remove the "MIN(component_id)" from the SELECT clause of the SELECT statement. – SQLpro Dec 21 '21 at 14:38
0

Here's a method to assign the group_id's to the component_id's.

It uses a recursive CTE with arrays to find the possible combinations.
The recursion starts from the lonely group_id's.

Then the next CTE picks one of the longest combinations.

WITH RECURSIVE RCTE AS (
    SELECT id, group_id, component_id
    , 1 as Lvl
    , array[group_id] as group_ids
    , array[component_id] as component_ids
    FROM YourTable
    WHERE group_id IN (
      SELECT group_id
      FROM YourTable
      GROUP BY group_id
      HAVING COUNT(*) = 1
    )
    UNION ALL
    SELECT t.id, t.group_id, t.component_id
    , Lvl+1
    , cte.group_ids || t.group_id
    , cte.component_ids || t.component_id
    FROM RCTE cte
    JOIN YourTable t 
      ON t.group_id != ALL(group_ids)
     AND t.component_id != ALL(component_ids)
)
, CTE_ARRAYS AS (
    SELECT group_ids, component_ids
    FROM RCTE
    ORDER BY array_length(group_ids, 1) desc, Lvl desc
    LIMIT 1
) 
SELECT a.group_id, a.component_id
FROM CTE_ARRAYS c
CROSS JOIN LATERAL UNNEST(c.group_ids, c.component_ids) WITH ORDINALITY AS a(group_id, component_id)
ORDER BY a.group_id;
group_id component_id
3 456
5 123

db<>fiddle here

LukStorms
  • 28,916
  • 5
  • 31
  • 45
  • Thank you! This looks like a good solution. – zhao xiongwei Dec 21 '21 at 14:55
  • If change the last row to , (5, 5, 789) and add one more row (6, 3, 789) and change , then expected result is group id 3, 4, 5. But the result is 4, 4, 5. https://dbfiddle.uk/?rdbms=postgres_10&fiddle=7ca25a675618760a1f3c39587a10e133 – zhao xiongwei Dec 21 '21 at 15:05
  • @zhaoxiongwei The solution has been completely changed. It's recursive now. – LukStorms Dec 21 '21 at 22:22
0

Use 2 levels of aggregation:

SELECT MIN(group_id) group_id
FROM (
  SELECT group_id, STRING_AGG(component_id::text, ',' ORDER BY component_id) components
  FROM tablename
  GROUP BY group_id
) t
GROUP BY components;

See the demo.

forpas
  • 160,666
  • 10
  • 38
  • 76