0

I am trying to fetch the resultset into a Set so that I can remove the duplicates and put it in a separate table to ease out the sanitizing process .

But when I try using this :

while (rs.next()) {
    set.add(new ABC(rs.getString(1), rs.getString(2), rs.getString(3), rs.getString(4), rs.getString(5),
        rs.getString(6), rs.getString(7), rs.getString(8), rs.getString(9), rs.getString(10),
        rs.getString(11), rs.getString(12), rs.getString(13), rs.getString(14), rs.getString(15),
        rs.getString(16), rs.getString(17), rs.getString(18), rs.getString(19), rs.getString(20),
        rs.getString(21), rs.getString(22), rs.getString(23), rs.getString(24), rs.getString(25),
        rs.getString(26)));
}

After 1 million records, Java throws a System overhead GC error? Any alternative?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Meht4409
  • 11
  • 2
  • 3
    Why? What did you expect? Why not just process the `ResultSet` you already have, row by row? – user207421 Dec 08 '19 at 09:21
  • because I need to findout duplicate in the resultset and putting it in a set for that ...overriding hashcode and equals – Meht4409 Dec 08 '19 at 09:22
  • Clarified in the question, thanks for pointing that out – Meht4409 Dec 08 '19 at 09:23
  • 4
    You might be better off leveraging the database for that. Just find a query or stored proc that can find your duplicates without needing you to load all the data in memory – ernest_k Dec 08 '19 at 09:24
  • So, this is not possible in java by any alternative ? – Meht4409 Dec 08 '19 at 09:26
  • What about fetching the data by chunks? – Chemaclass Dec 08 '19 at 09:27
  • 4
    Some simple arithmetiics: you have 10 million rows. Each row has 26 strings. Let's assume they're all tiny and only consume 25 bytes. Let's ignore the memory used by Set entries. All this would consume 10,000,000 x 25 x 26 bytes = 6.5 GB of memory. You probably don't have that much of memory. And the strings probably consume much more than that. And the set and the ABC instances adds memory too. – JB Nizet Dec 08 '19 at 09:28
  • 4
    In general you should never attempt to process an entire result set in memory. You can't rely on fitting it all into memory, and it is wasteful to transport it all over the network when you can do the processing at the server side. SQL already provides you with filters, groupings, group totals, all kinds of things. Use them. – user207421 Dec 08 '19 at 09:29
  • I think the error has more to do with object creation everytime I iterate a resultset.. – Meht4409 Dec 08 '19 at 09:30
  • 1
    More to do with it than what? There is nothing *else* it has to do with. And from your description it isn't *necessary* to do it in Java. – user207421 Dec 08 '19 at 09:30
  • Maybe if you increased the Xmx size it might help, Even I believe that you to fetch data into chunks – Mahmoud Al Siksek Dec 08 '19 at 09:42
  • Related, possibly duplicate: [Error java.lang.OutOfMemoryError: GC overhead limit exceeded](https://stackoverflow.com/questions/1393486/error-java-lang-outofmemoryerror-gc-overhead-limit-exceeded) – Mark Rotteveel Dec 08 '19 at 11:11

2 Answers2

8

If your end desired result is a new table whose data is the original table minus duplicates, then this is an operation which should be completely handled in your database, not in Java:

CREATE TABLE newTable (col1 varchar(50), col2 varchar(50), ..., col26 varchar(50));
INSERT INTO newTable (col1, col2, ..., col26)
SELECT DISTINCT col1, col2, ..., col26
FROM originalTable;
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

If you cannot do this in any other way than loading all the data into memory, and there is some redundancy in the strings you are extracting from the database, String interning might help.

Use a utility method to handle nulls properly, then wrap all getString calls with it.

while (rs.next()) {
    set.add(new ABC(intern(rs.getString(1)), intern(rs.getString(2)), 
intern(rs.getString(3)), intern(rs.getString(4)), intern(rs.getString(5)),
        intern(rs.getString(6)), intern(rs.getString(7)), intern(rs.getString(8)), 
intern(rs.getString(9)), intern(rs.getString(10)),
        intern(rs.getString(11)), intern(rs.getString(12)), intern(rs.getString(13)), 
intern(rs.getString(14)), intern(rs.getString(15)),
        intern(rs.getString(16)), intern(rs.getString(17)), intern(rs.getString(18)), 
intern(rs.getString(19)), intern(rs.getString(20)),
        intern(rs.getString(21)), intern(rs.getString(22)), intern(rs.getString(23)), 
intern(rs.getString(24)), intern(rs.getString(25)),
        intern(rs.getString(26))));
}       

private String intern(String string) {
         return string == null ? null : string.intern();
}

Running on Java 9 or higher will also help, as strings on average use less memory with these versions.

NorthernSky
  • 488
  • 2
  • 10