Duplicate entry whice using distinct keyword in query

Question

I am using java to execute some SQL Queries. Some of them are Getting data from one database(A) and storing in a table in another database(B).After process is done i am deleting all data from table in database(B). I am repeating this process every 5 mins. Code:

         String sql = "delete from newtable";
         stmt5 = conn.prepareStatement(sql);
         stmt5.executeUpdate(sql);
         String sql_1 = "select distinct tbl_alm_log_2000000000.Csn, tbl_alm_log_2000000000.IsCleared, tbl_alm_log_2000000000.Id,tbl_alm_log_2000000000.NEType, tbl_alm_log_2000000000.OccurTime, tbl_alm_log_2000000000.hostIP, tbl_alm_log_2000000000.ExtendInfo From fmdb.dbo.tbl_alm_log_2000000000 Where IsCleared = 0";

         ResultSet rs =  stmt_1.executeQuery(sql_1);



        String sql_2 = "insert into newtable (CSN, IsCleared, Id, NEType, OccurTime, hostIP) values(?,?,?,?,?,?)";


        conn.setAutoCommit(false);
         PreparedStatement ps = conn.prepareStatement(sql_2);

         final int batchSize = 1000;
         int count = 0;

         while (rs.next()){                 
                 ps.setString(1, rs.getString(1)); //csn
                 ps.setString(2, rs.getString(2));  //iscleared
                 ps.setString(3, rs.getString(3));//id
                 ps.setString(4, rs.getString(4));//netype
                 ps.setString(5, rs.getString(5));//occurtime
                 ps.setString(6, rs.getString(6));//hostip


             ps.addBatch();

             if(++count % batchSize == 0) {
                 ps.executeBatch();
             }
         }
         ps.executeBatch(); // insert remaining records
         conn.commit();
         ps.close();

It runs perfectly for 10 -20 runs and then gives "duplicate entry error for "value" in Csn as it is Primary key". I added Distinct keyword in query and it is still giving this error after 10-20 runs.

Note: I m deleting data from newtable befor start of process so it is always adding in a empty table.

Suggest where i am going wrong.

Check whether you have different value for field `ExtendInfo`. You have not used this field in `insert`. If you do not need it. Remove that field from your `select`. — Viki888, Jan 23 '17 at 12:14
I removed the incompatible database tags. Please tag with the database you are really using. — Gordon Linoff, Jan 23 '17 at 12:15
@viki888 It is being used, but i removed that part as i thought it didnt concern this issue. Should i put it back ? — prateek k, Jan 23 '17 at 12:17
DISTINCT removes duplicate rows, not duplicate columns. If you want to see what is really happening, just add some logging on what you're trying to insert. — john16384, Jan 23 '17 at 12:18
@prateek What I mean is that DISTINCT will not guarantee that the column Csn will not contain duplicates. It only guarantees that the entire row is not a duplicate (ie, all values are the same). — john16384, Jan 23 '17 at 12:22
@prateekk . . . I removed them again (and they were different). MySQL and Sybase are *different* databases. Please use the tag for the one that you are really using. — Gordon Linoff, Jan 23 '17 at 13:14

score 1 · Accepted Answer · edited May 23 '17 at 11:52

Looks like you have misunderstanding about how does distinct work. In query with several selected columns it will search for distinct tuples of values, not for distinct Csn column only.

There are different ways how to select distinct values by one column only. It generally depends on particular DBMS you use and logic you want to apply for multiply tuples found for same Csn column values. Consider for instance this question: DISTINCT for only one Column

One of general ideas: select distinct single values for Csn column only, then iterate through this list and select first tuple of values with this Csn value (I don't know is it suitable for you select first tuple or not).

score 0 · Answer 2 · answered Jan 23 '17 at 12:37

when you insert the data , you can add if not exists not make sure your data is unique ( i considered CSN only column in PK)

if not exists(select 1 from tbl_alm_log_2000000000  where CSN=? ) 
  insert into newtable (CSN, IsCleared, Id, NEType, OccurTime, hostIP) values(?,?,?,?,?,?)

Duplicate entry whice using distinct keyword in query

2 Answers2