Finding duplicates in ABAP internal table via grouping

Question

We all know these excellent ABAP statements which allows finding unique values in one-liner:

it_unique = VALUE #( FOR GROUPS value OF <line> IN it_itab 
                     GROUP BY <line>-field WITHOUT MEMBERS ( value ) ).

But what about extracting duplicates? Can one utilize GROUP BY syntax for that task or, maybe, table comprehensions are more useful here?

The only (though not very elegant) way I found is:

LOOP AT lt_marc ASSIGNING FIELD-SYMBOL(<fs_marc>) GROUP BY ( matnr = <fs_marc>-matnr 
                                                             werks = <fs_marc>-werks )
                                                  ASSIGNING FIELD-SYMBOL(<group>).
  members = VALUE #( FOR m IN GROUP <group> ( m ) ).

  IF lines( members ) > 1.
    "throw error
  ENDIF.

ENDLOOP.

Is there more beautiful way of finding duplicates by arbitrary key?

Suncatcher · Accepted Answer · 2018-05-24T14:03:59.377

So, I just put it as answer, as we with Florian weren't able to think out something better.
If somebody is able to improve it, just do it.

TYPES tt_materials TYPE STANDARD TABLE OF marc WITH DEFAULT KEY. 

DATA duplicates TYPE tt_materials. 
LOOP AT materials INTO DATA(material) 
GROUP BY ( id = material-matnr 
           status = material-pstat 
           size = GROUP SIZE ) 
ASCENDING REFERENCE INTO DATA(group_ref). 

CHECK group_ref->*-size > 1. 
duplicates = VALUE tt_materials( BASE duplicates FOR <status> IN GROUP group_ref ( <status> ) ). 

ENDLOOP.

Florian · Answer 2 · 2018-03-26T08:50:28.207

1

Given

TYPES: BEGIN OF key_row_type,
         matnr TYPE matnr,
         werks TYPE werks_d,
       END OF key_row_type.
TYPES key_table_type TYPE
  STANDARD TABLE OF key_row_type
  WITH DEFAULT KEY.

TYPES: BEGIN OF group_row_type,
         matnr TYPE matnr,
         werks TYPE werks_d,
         size  TYPE i,
       END OF group_row_type.
TYPES group_table_type TYPE
  STANDARD TABLE OF group_row_type
  WITH DEFAULT KEY.

TYPES tt_materials TYPE STANDARD TABLE OF marc WITH DEFAULT KEY.
DATA(materials) = VALUE tt_materials(
  ( matnr = '23' werks = 'US' maabc = 'B' )
  ( matnr = '42' werks = 'DE' maabc = 'A' )
  ( matnr = '42' werks = 'DE' maabc = 'B' ) ).

When

DATA(duplicates) =
  VALUE key_table_type(
    FOR key IN VALUE group_table_type(
      FOR GROUPS group OF material IN materials
      GROUP BY ( matnr = material-matnr
                 werks = material-werks
                 size  = GROUP SIZE )
      WITHOUT MEMBERS ( group ) )
    WHERE ( size > 1 )
    ( matnr = key-matnr
      werks = key-werks ) ).

Then

cl_abap_unit_assert=>assert_equals(
    act = duplicates
    exp = VALUE tt_materials( ( matnr = '42' werks = 'DE') ) ).

Readability of this solution is so bad that you should only ever use it in a method with a revealing name like collect_duplicate_keys.

Also note that the statement's length increases with a growing number of key fields, as the GROUP SIZE addition requires listing the key fields one by one as a list of simple types.

edited Mar 26 '18 at 08:50

answered Mar 24 '18 at 22:54

Florian

4,821
2
19
44

`whether you want the table of extracted duplicates to contain duplicates itself, or reduce them to unique keys`, as the solution to the latter was given in my question, you could guess I want the former :) I want to extract duplicates to separate table without affecting the initial table – Suncatcher Mar 25 '18 at 09:00
Your question actually only provides a solution to "how do I throw an error if my table contains duplicates". – Florian Mar 26 '18 at 06:58
check the first snippet with `it_unique` table, it does exactly your first use-case, which you already removed from the answer – Suncatcher Mar 26 '18 at 07:16
Okay. Updated the answer. – Florian Mar 26 '18 at 07:32
`Readability of this solution is so bad `, yep, it's horrible :) Your first variant was much better. I made a [solution based on it](https://pastebin.com/vELdxhHn). Can we make it more concise? Can we exclude unique groups (size=1) some way except explicit checking inside loop? – Suncatcher Mar 26 '18 at 13:11
Actually I wasn't able to implement your current solution to anything except `werks`. It fails with error alike `group row type doesn't fit to group`. Does it depends on DB key? – Suncatcher Mar 26 '18 at 13:13
No, it doesn't depend on the database key. Using `MARC` for the typing is arbitrary. However, this could relate to the NetWeaver release, as the used constructs are relatively new. This exact code runs on NW 7.53 SP00, the base of S/4HANA 1805. – Florian Mar 26 '18 at 14:35
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/167569/discussion-between-florian-and-suncatcher). – Florian Mar 26 '18 at 14:50

score 0 · Answer 3 · answered Feb 22 '18 at 08:28

0

What about the classics? I'm not sure if they are deprecated or so, but my first think is about to create a table clone, DELETE ADJACENT-DUPLICATES on it and then just compare both lines( )... I'll be eager to read new options.

answered Feb 22 '18 at 08:28

VXLozano

317
1
7

Classics is not convenient as it requires deleting, so one need additional temp itab to accomplish the task. In case table consists of millions records it's not cool. – Suncatcher Feb 22 '18 at 08:35
Ok, I see your point, and I really appreciate -and like- it. So, the answer will be the use of that new sentences I'm not familiar with. I'll keep my answer here instead of delete it to prevent further oldies to write the same answer ;) – VXLozano Feb 22 '18 at 08:38
Yep. The beauty of `VALUE` operator is that it constructs new table from initial and the initial table is **untouched**. – Suncatcher Feb 22 '18 at 08:40
COLLECT is still a bad option? If you are creating a new useless table anyways, maybe you can COLLECT your "key" fields and a value of 1 into that new table and just raise your error when any row reaches value of 2... Again, I'm not sure if I'm wrong, but while we wait for a better answer, this conversation is useful (for me, at least ;) ) – VXLozano Feb 22 '18 at 08:44
How do you raise error via `COLLECT`? In case of violation of primary key in a COLLECT structure it just results in a **non-handleable** exception (short-dump). Not cool :) – Suncatcher Feb 22 '18 at 08:53
It was just a blind shoot... There is no PK violation in a COLLECT, because if a PK is duplicated and no numbers COLLECT'ed, the "new" row is just ignored. The idea is to COLLECT that 1 value, so if there are no duplicates, your new table will have all its "value" fields with 1, but if you have a single 2, you have duplicates. You can check it at each iteration: LOOP-COLLECT-READ TABLE WITH KEY value = 2. subrc = 0 -> error – VXLozano Feb 22 '18 at 08:58
All in all you cannot catch the moment of key duplication (or collecting), you gotta analyze collected table afterwards and that is bummer. It's not a solution. – Suncatcher Feb 22 '18 at 09:12
`You can check it at each iteration: LOOP-COLLECT-READ TABLE WITH KEY value = 2. subrc = 0 -> error` That's what I call bummer :) – Suncatcher Feb 22 '18 at 09:13
I supposed. As I told you, it was a blind shot. Chat warning raised, let's keep quiet from now :P (but thanks for your answers; I've learnt few things about performance thinking about them. – VXLozano Feb 22 '18 at 09:15

Finding duplicates in ABAP internal table via grouping

3 Answers3

Linked

Related