1

In a project I am working on, I am given a list that contains a little under 1 million lines. The data maps all possible origins (000-999) to all possible destinations (000-999).

For each combination, I need to be able to look at the database and determine if there exists a record with the same combination. If no record exists, then it will be added to the database. If the record does exist, then the record will be updated with the new information.

The origin and destination are the primary key of the table, and are also indexes. This is all on an ORACLE database.

Given that I have to do this 1 million times, what is the best possible solution? My current method is taking upwards of an hour to process all records.

For actually inserting and updating the records, I am using a batch query process that doesn't take much time at all.

The part that appears to be taking the most amount of time is querying the database for the count of existing records.

public String batchUpdateModes(List records, String user) throws TransactionException {
    String message = "";
    ArrayList updateList = new ArrayList();
    ArrayList insertList = new ArrayList();
    Connection conn = null;
    try {
        conn = getDao().getConnection();
    } catch (SQLException e1) {
        e1.printStackTrace();
    }
    for (int i = 0; i < records.size(); i++) {
        BatchFileCommand record = (BatchFileCommand)records.get(i);
        String origin = record.getOrigZip().trim();
        String dest = record.getDestZip().trim();
        String pri = record.getPriMode().trim();
        String fcm = record.getFcmMode().trim();
        String per = record.getPerMode().trim();
        String pkg = record.getPkgMode().trim();
        String std = record.getStdMode().trim();
        String effDate = record.getEffDate();
        String discDate = "";

        TransModeObj obj = new TransModeObj(origin, dest, pri, fcm, per, pkg, std, effDate, discDate);
        obj.setUserId(user);
        try {
            Statement stmt = null;
            String findExisting = "select count(*) from trans_mode where orig_zip = " + origin + " " +
                    "and dest_zip = " + dest;
            stmt = conn.createStatement();
            ResultSet rs = stmt.executeQuery(findExisting);
            int count = 0;
            while (rs.next()) {
                count = rs.getInt(1);
            }
            if (count > 0) {
                updateList.add(obj);
            }
            else {
                insertList.add(obj);
            }
            rs.close();
            stmt.close();


        } catch (SQLException e) {
            e.printStackTrace();
            message = e.getMessage();
        }
    }
    try {
        conn.close();
    } catch (SQLException e) {
        e.printStackTrace();
    }
    boolean success = false;
    recordCount[0] = updateList.size();
    recordCount[1] = insertList.size();
    success = insertTransModes(insertList);
    System.out.println("Inserts Complete");
    success = updateTransModes(updateList);
    System.out.println("Updates Complete");

    if (success) {
        message = "success";
    }
    else {
        message = "The changes or additions submitted could not be completed.";
    }

    return message;
Evidencex
  • 111
  • 1
  • 11
  • Just and advice, If your data is completely formatted, then you can simply use SQL loader. – Aditya Peshave Jan 31 '14 at 21:31
  • Variant with `MERGE` is good, but if your load big amount of data from external source there are many ways which provides good performance. Please review a links from [this answer](http://stackoverflow.com/a/3757726/232279), especially variant with [SQL*Loader](http://docs.oracle.com/cd/E11882_01/server.112/e22490/ldr_concepts.htm#SUTIL978) . – ThinkJet Jan 31 '14 at 21:51

2 Answers2

0

The easiest solution is to ditch the counts and just use a MERGE statement. This allows the database to figure out whether to insert or update in a single SQL transaction. Find out more.

The one drawback with MERGE is that the rowcount doesn't distinguish between rows updated and rows inserted. This is probably a cheap price to pay for the overall time saved. Although if you really can't do without separate counts, Adrian Billington has a workaround for you.

APC
  • 144,005
  • 19
  • 170
  • 281
-1

As APC mentioned - MERGE is a good option when you need to either insert or update. But that may update records you didn't wish to update.

First question is what's the primary key that uniquely identifies your records (is it a composition of several fields)?

Another approach could be to load all existing records' primary keys into memory beforehand and rule out duplicated from records list (taking into consideration that you posses the required amount of RAM)

Also take a look at this and that options.

Community
  • 1
  • 1
aljipa
  • 716
  • 4
  • 6
  • The Primary key is a composite of the origin and destination which is indexed in the database. – Evidencex Jan 31 '14 at 21:24
  • Check that provided links, hope it helps. – aljipa Jan 31 '14 at 21:28
  • "that may update records you didn't wish to update." Eh? MERGE will only update records identified in the keys provided by the source data. – APC Jan 31 '14 at 23:36
  • The point is that MERGE may update NON-key fields from the provided data, overriing those that are already stored in the DB. – aljipa Feb 01 '14 at 14:54
  • As I'm not sure about the application's logic I mentioned this moment just for it to be noted. – aljipa Feb 01 '14 at 15:18