Compare one row of resultset with remaining all and find duplicates

Question

I am trying to find duplicate rows from my resultset. I have a large data in resultset, which is the best way to find duplicates from resultset? I've tried with arraylist too.

List inner=new ArrayList<String>();
Connection con;
Statement stmt;
ResultSet rs;
ResultSetMetaData rsmd;
int columnNumber;

try{
    Class.forName("com.mysql.jdbc.Driver");
    con=DriverManager.getConnection("jdbc:mysql://localhost:3306/mydb","root","");
    stmt=con.createStatement();
    rs=stmt.executeQuery("select * from mydata_table where srno<1000");
    rsmd=rs.getMetaData();
    columnNumber=rsmd.getColumnCount();
    while(rs.next()){

        for(int i=1;i<columnNumber;i++){
            inner.add(rs.getString(i));
        }

   }           
   System.out.println("\n" + inner);

   rs.close();
   con.close();
}catch(Exception e){
    System.out.println(e);
}

I want to compare one row with remaining rows and find duplicate rows from the list.

Then why don't you use `DISTINCT` clause in SQL? That would be cleaner solution — Pradeep Simha, Aug 17 '16 at 12:46
This sounds like something that should be done with SQL rather than the `ResultSet`. — bradimus, Aug 17 '16 at 12:46
I've tried DISTINCT clause but it didn't return unique records in my case. — iks_in, Aug 17 '16 at 12:49
You can do that all in SQL see http://stackoverflow.com/questions/854128/find-duplicate-records-in-mysql for a get you started answer — RiggsFolly, Aug 17 '16 at 12:55
Instead of `ArrayList` you can use `Set` that would solve(at-least with code you have pasted) — Pradeep Simha, Aug 17 '16 at 12:56
Yes if you're using only one column from resultset like in your code. — Pradeep Simha, Aug 17 '16 at 13:01
No there are more than 40 columns in my resultset. I am bit confused, how to solve it..What is the best approach to solve it? — iks_in, Aug 17 '16 at 13:02
Welcome to Stack Overflow! Can you please have a better title and more detailed information in the content with your effort to solve the problem? — Enamul Hassan, Aug 19 '16 at 16:56

score 0 · Answer 1 · answered Aug 17 '16 at 12:57

0

If you sort the records in the query (probably ORDER BY ...) you then only need to compare each record with it's successor.

answered Aug 17 '16 at 12:57

OldCurmudgeon

64,482
16
119
213

Thank you for your help.But there are more than 40 columns in it, order by with which column? – iks_in Aug 17 '16 at 13:01
@iks_in - All of them if you like. Or as many of them that will make the duplicate check easy. If, for example, columns 1,2 and 3 are generally unique in most records, sort by them and then only search forward while all of them are the same. – OldCurmudgeon Aug 17 '16 at 13:04

Compare one row of resultset with remaining all and find duplicates

1 Answers1