4

I have one table like this:

ID, ItemsID
1   2                                                                       
1   3     
1   4   
2   3  
2   4  
2   2

I want to delete tuples like ID=2 because 2,3,4 are same as 3,4,2 in my situation.

How can I do that using SQL?

Ben
  • 51,770
  • 36
  • 127
  • 149
  • The original title was confusing and made me think of a different question about duplicated rows. In DB theory a tuple is the same as a row. – madth3 Nov 17 '12 at 02:30
  • Which version of Oracle? – APC Nov 17 '12 at 09:07
  • You mean you want to delete ID=2 and keepID=1 because all rows with ID=2 have the same itemsID values as those rows ID=1? Additionally: your statement seems to indicate that the rows have an actual order, but they don't. You could also get the rows in the order 4,3,2 or 2,4,3 or 3,2,4 as long as you don't specify an order by statement. –  Nov 17 '12 at 10:31

3 Answers3

1

Sorry, I didn't see the Oracle tag in time. However, I'll leave the MySQL solution in place for reference. Apparently there is something like GROUP_CONCAT() in some Oracle versions.


It might not be the most elegant solution but this will do the job:

DELETE FROM t WHERE ID IN (
  SELECT ID
  FROM (SELECT ID, GROUP_CONCAT(ItemsID ORDER BY ItemsID) AS tuple FROM t GROUP BY ID) AS tuples
  WHERE EXISTS (
    SELECT TRUE
    FROM (SELECT ID, GROUP_CONCAT(ItemsID ORDER BY ItemsID) AS tuple FROM t GROUP BY ID) tuples2
    WHERE tuples2.tuple = tuples.tuple
    AND tuples2.ID < tuples.ID
  )
)

SQLfiddle

You might want to adjust group_concat_max_len.

Community
  • 1
  • 1
AndreKR
  • 32,613
  • 18
  • 106
  • 168
1

Another approach. Oracle 10gR1 or higher. In this example we have the same set of ItemsID values for IDs 1, 2, 6, another one for 3 and another one for 4 and 5. So we are going to delete IDs 2, 6 and 5, as they appear to be duplicates, by representing each set of ItemsID elements of a particular ID group as a nested table and using multiset except operator to determine if the elements in a group are the same:

-- set-up
SQL> create table tb_table(
  2    id number,
  3    itemsid number);

Table created

SQL> insert into tb_table(id, itemsid)
  2    select 1, 2 from dual union all
  3    select 1, 3 from dual union all
  4    select 1, 4 from dual union all
  5    select 2, 4 from dual union all
  6    select 2, 3 from dual union all
  7    select 2, 2 from dual union all
  8    select 3, 2 from dual union all
  9    select 3, 3 from dual union all
 10    select 3, 6 from dual union all
 11    select 3, 4 from dual union all
 12    select 4, 1 from dual union all
 13    select 4, 2 from dual union all
 14    select 4, 3 from dual union all
 15    select 5, 1 from dual union all
 16    select 5, 2 from dual union all
 17    select 5, 3 from dual union all
 18    select 6, 2 from dual union all
 19    select 6, 4 from dual union all
 20    select 6, 3 from dual;

19 rows inserted

SQL> commit;

Commit complete

SQL> create or replace type t_numbers as table of number;
  2  /

Type created

-- contents of the table
SQL> select *
  2    from tb_table;

        ID    ITEMSID
---------- ----------
         1          2
         1          3
         1          4
         2          4
         2          3
         2          2
         3          2
         3          3
         3          6
         3          4
         4          1
         4          2
         4          3
         5          1
         5          2
         5          3
         6          2
         6          4
         6          3


19 rows selected

SQL> delete from tb_table
  2   where id in (with DataGroups as(
  3                  select id
  4                       , grp
  5                       , (select count(*) from table(grp)) cnt
  6                    from (select id
  7                               , cast(collect(itemsid) as t_numbers) grp
  8                           from tb_table
  9                          group by id
 10                          )
 11                   )
 12                   select distinct id2
 13                     from ( select dg1.id as id1
 14                                 , dg2.id as id2
 15                                 , (dg1.grp multiset except dg2.grp) res
 16                                 , dg1.cnt
 17                             from DataGroups Dg1
 18                            cross join DataGroups Dg2
 19                            where dg1.cnt = dg2.cnt
 20                            order by dg1.id
 21                           ) t
 22                    where res is empty
 23                      and id2 > id1
 24                   )
 25  ;

9 rows deleted

 SQL> select *
  2    from tb_table;

        ID    ITEMSID
---------- ----------
         1          2
         1          3
         1          4
         3          2
         3          3
         3          6
         3          4
         4          1
         4          2
         4          3

10 rows selected
Nick Krasnov
  • 26,886
  • 6
  • 61
  • 78
0

This is the best I can come up with, but somehow I have the feeling that there must be a simpler solution:

delete from items
where id in (
  select id 
  from (
    with counts as (
       select id, 
              count(*) as cnt
       from items
       group by id
    )
    select c1.id, row_number() over (order by c1.id) as rn
    from counts c1
      join counts c2 
        on c1.id <> c2.id and c1.cnt = c2.cnt
    and not exists (select i1.itemsid
                    from items i1
                    where i1.id = c1.id
                    minus 
                    select i2.itemsid
                    from items i2
                    where i2.id = c2.id)
  ) t
  where rn <> 1
);

It will work for any number of itemsid values.

the rn <> 1 combined with an ascending sort in the window definition will keep the smallest id in the table (in your case the 1). If you want to keep the highest ID value, you need to change the sort order to over (order by c1.id desc)