0

Need help in effective memory management for below scenario. I am fetching data from two different databases and comparing data in Java.(currently testing on single database with two queries).

As 9.8 million records needs to be compared, I am copying 50k records each time and loading into ArrayList and comparing using Binarysearch. Though I am clearing (assigning to null and running gc) the arraylist after every iteration, I am getting Heap space error(Assigned 1GB RAM) after comparing 2.5 million records.

Where is the memory leakage in my query?

Query1= select empno,ename from table1 order by empno;
Query2= select empno,ename from table2 order by empno;

ResultSet rs1 = st1.executeQuery(query1);
ResultSet rs2 = st2.executeQuery(query2);               
for (;;) {
    ArrayList<String> al = new ArrayList<String>();
    ArrayList<String> al1 = new ArrayList<String>();

    if (totalRecords1 == Ubound)
        break;

    Lbound = Ubound + 1;
    Ubound = min(Ubound + 50000, totalRecords1);
    System.out.println("Lbound : " + Lbound);
    System.out.println("Ubound : " + Ubound);

    for (int i = Lbound; i <= Ubound; i++) {
        recordConcat1 = ""; recordConcat2 = "";
        String recordConcat1 = "", recordConcat2 = "";
        rs1.next();
        rs2.next();

        recordConcat1 = recordConcat1 + rs1.getString(z) + " ǀ ";
        recordConcat2 = recordConcat2 + rs2.getString(z) + " ǀ ";

        al.add(recordConcat1);

        al1.add(recordConcat2);

    }  /* End of First Lap */ 

    System.out.println("End of Lap : "+lap++);

    int index =0;
    for(int like=0;like<al.size();like++) {

        if(Collections.binarySearch(al1,al.get(like))>=0)
            continue;
        else {
            System.out.println("Not matched : "+ al.get(like));
            break;
        }
    }

    al =null;
    al1=null;
    System.gc();

} /* End of Infinite Loop */ 
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
  • Which database + JDBC driver are you using? – Mark Rotteveel Feb 07 '17 at 16:15
  • 1
    Why don't you create a view in one database and select it from the another database using `join` statement and leave the "hard work" to the database, that knows how to manage billions of records instead of do this in Java? – igventurelli Feb 07 '17 at 16:23
  • First of calling System.gc() just hints the VM that garbage collection should run. With this thing in mind you can create the 2 ArrayList outside the for loop and clear them instead. – xyclops Feb 07 '17 at 16:34
  • This post shoud offer you some clues http://stackoverflow.com/questions/2826319/using-hibernates-scrollableresults-to-slowly-read-90-million-records .good luck. – xyclops Feb 07 '17 at 16:40
  • @mark : oracle jdbc driver. – Elisetty Narendra Feb 07 '17 at 17:14
  • @igor..: I need to compare data in heterogeneous db. So i dont have data in same database. – Elisetty Narendra Feb 07 '17 at 17:15
  • Thanks xyclops.. Does the resultset loads all into memory or is it just a pointer to records in database. Why am i not getting heapspace error after retrieving the resultset object but only after iterating over 2.5 million records?? – Elisetty Narendra Feb 07 '17 at 17:23
  • 1
    @xyclops That question is about MySQL, which is notorious for losing the entire result set into memory immediately. – Mark Rotteveel Feb 07 '17 at 18:04
  • You may want to verify that your Statements are being created as `ResultSet.TYPE_FORWARD_ONLY` and `ResultSet.CONCUR_READ_ONLY` to help minimize the amount of memory they require. – Gord Thompson Feb 07 '17 at 18:14
  • @MarkRotteveel - *"MySQL ... losing the entire result set"* - Now now, be nice. ;) – Gord Thompson Feb 07 '17 at 18:17
  • @GordThompson haha, oops, that should have been loading. That's what you get with Swype and not proofreading... – Mark Rotteveel Feb 07 '17 at 18:34
  • 1
    @ElisettyNarendra yep! I know it. So, can't you create a `view` in one database and do the select from the other database? You can conect both databases with `dblink`. Those dbs are different? For example: one is Oracle and the other is MySQL? Or are the same? If are the same you can easily connect one to other to make this select! – igventurelli Feb 07 '17 at 18:46
  • did you solved it? – xyclops Feb 08 '17 at 19:12

1 Answers1

0

Instead of setting the ArrayLists to null and calling GC. Call ArrayList.clear() That will not release the memory of the ArrayLists but reuse it. Moreover, because you have a fixed upper bound at 50000, pass that number to the constructor of ArrayList to avoid dynamic memory reallocation while automatically growing the ArrayLists from their default initial size 10.

On the other side, if I read properly, what you are trying to achieve is tell whether every element of one table is at the other. Because the data is sorted you can do this without loading chunks into memory but reading both ResultSets in parallel.

I think that your algorithm will not work properly even without the out of memory error. You read 50K from table A and 50K from table B. Now search each element of chunk B on chunk A. But what if an element within the first 50000 of B is at position 50001 of A. You'll think it's not there but actually it is. Right?

Serg M Ten
  • 5,568
  • 4
  • 25
  • 48
  • I agree that if any record doesnt exist in either tables,the comparison may go wrong.. I have handled such scenarios outside the code produced here..i tried first using arraylist.clear() before.. That didnt work. After some analysis i pointed arrays to null and ran GC. I also tried comparing directly iterating over both resultset buthad to avoid that approach due to performance isssues. The current approach of using bineary search seems very very faster . – Elisetty Narendra Feb 07 '17 at 17:13
  • I don´t see why parallel iteration should be any slower than pre-load + binary search. If your database and JDBC drive support it, use setFetchSize to fetch a large number of rows on each round trip. That will give you a great performance gain if the feature is available. Moreover reusing the ArrayLists you could reuse also recordConcat1 and recordConcat2 by converting them into StringBuilder objects and clearing them on each iiteration. – Serg M Ten Feb 07 '17 at 18:17