0

I have an internal table with 2 million rows that's been uploaded from a file. I want to delete any lines that are duplicates and extract the row numbers of the duplicates and add them to another table. What's the best/most efficient way to do this with ABAP 7.40? Classic ABAP is also fine.

So here's an example of my original table and I want to find duplicates by comparing columns A and B

A  | B  | C
-----------
a1 | b1 | c1
a1 | b2 | c1
a2 | b1 | C2
a1 | b1 | c2
a2 | b2 | c2

Rows 1 and 4 are duplicates so I'd want to remove both of them to end up with

A  | B  | C
-----------
a1 | b2 | c1
a2 | b1 | C2
a2 | b2 | c2

and also have another table that stores duplicates:

Row number  | Error 
-------------------
1           | Duplicate
4           | Duplicate      

I've seen similar requests on this site but they work a bit differently to what I need. Thanks.

Suncatcher
  • 10,355
  • 10
  • 52
  • 90
mmgro27
  • 475
  • 1
  • 8
  • 18
  • 1
    What have you tried? At least post here the structure of this table. The easiest way would be to copy the table, sort it and then delete adjacent duplicates. Then to get the duplicates themselves you could use the solution [here](https://stackoverflow.com/questions/54907235/difference-of-two-sets-of-values) – Jagger Mar 06 '19 at 14:02
  • Sorry about that - I've added more details to the post. With your method I think I'd then need to loop through the whole table again to delete the duplicates and find row numbers. Do you think this is an efficient way of going about this or can you think of a better way now you can see my full requirement? – mmgro27 Mar 06 '19 at 16:55
  • 1
    So you probably found this [question](https://stackoverflow.com/questions/48810878/finding-duplicates-in-abap-internal-table-via-grouping). Why doesn't it answer your question? – Sandra Rossi Mar 06 '19 at 16:56
  • @SandraRossi my requirement is for removing duplicates from a table and storing their row number. That thread was about extracting duplicates which is different. I'll try again to adapt their method for my requirement unless someone suggests a better way. – mmgro27 Mar 06 '19 at 17:04
  • @SandraRossi Re your edit to my post : I'm checking duplicates by comparing columns A and B only which is why I posted the data like that. – mmgro27 Mar 06 '19 at 17:10
  • 1
    You may adapt it a little bit: if size = 1 then it's not a duplicate else if size > 1 then it's a duplicate. The only issue left is to get the row numbers of the duplicates. – Sandra Rossi Mar 06 '19 at 17:12
  • 1
    @mmgro27 thanks, I rollbacked my edit. I'm not sure whether it serves the question to have the column C (always provide the minimum example and code). – Sandra Rossi Mar 06 '19 at 17:16
  • 1
    Possible duplicate of [Finding duplicates in ABAP internal table via grouping](https://stackoverflow.com/questions/48810878/finding-duplicates-in-abap-internal-table-via-grouping) – Suncatcher Mar 06 '19 at 20:45

1 Answers1

6

This is the code to find which lines are duplicates (valid >= 7.40) :

TYPES : BEGIN OF ty_line,
          a TYPE c LENGTH 2,
          b TYPE c LENGTH 2,
          c TYPE c LENGTH 2,
        END OF ty_line,
        ty_lines TYPE STANDARD TABLE OF ty_line WITH EMPTY KEY.

DATA(itab) = VALUE ty_lines(
( a = 'a1' b = 'b1' c = 'c1' )
( a = 'a1' b = 'b2' c = 'c1' )
( a = 'a2' b = 'b1' c = 'c2' )
( a = 'a1' b = 'b1' c = 'c2' )
( a = 'a2' b = 'b2' c = 'c2' ) ).

DATA(duplicates) = VALUE string_table(
    FOR GROUPS <group> OF <line> IN itab
    GROUP BY ( a = <line>-a b = <line>-b size = GROUP SIZE )
    ( LINES OF COND #( WHEN <group>-size > 1 THEN VALUE string_table( (
        concat_lines_of(
            table = VALUE string_table( 
                    FOR <line2> IN GROUP <group> INDEX INTO tabix ( |{ tabix }| ) )
            sep   = ',' ) ) ) ) ) ).

ASSERT duplicates = VALUE string_table( ( `1,4` ) ).

I use LINES OF to not generate a line if the group has a size of 1.

Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48