0

I am trying to select around 1 million records from redshift table and then need to insert back those to redshift table (after doing some manipulations)

But, that is taking very long. I waited around 1 hour for the program to terminate but no luck. Console also does not seem to print the print statements but seems stuck after printing few statements.

Tried the same for 100 records, it works fine and takes around 2 minutes.

This is part of my code:

        conn.setAutoCommit(false);
        stmt = conn.createStatement();
        stmt.setFetchSize(100);
        ResultSet rsSelect = stmt.executeQuery("select * from table");
        System.out.println("select done !");

        String queryInsert = "insert into table"
                +"(event_id,domain_userid,collector_tstamp,se_category,se_action,se_label,se_property)"
                +"values(?,?,?,?,?,?,?)";

        PreparedStatement preparedStatement = conn.prepareStatement(queryInsert);
        final int batchSize = 10000;
        int count = 0;
        System.out.println("about to go into loop !");


        while(rsSelect.next()){

            String event_id = rsSelect.getString("event_id");
            String domain_userid = rsSelect.getString("domain_userid");
            Timestamp collector_tstamp = rsSelect.getTimestamp("collector_tstamp");
            String se_category = rsSelect.getString("se_category");
            String se_action = rsSelect.getString("se_action");
            String se_label = rsSelect.getString("se_label");
            String se_property = rsSelect.getString("se_property");

            //some manipulations

            preparedStatement.setString(1, event_id);
            preparedStatement.setString(2, domain_userid);
            preparedStatement.setTimestamp(3, collector_tstamp);
            preparedStatement.setString(4, se_category);
            preparedStatement.setString(5, se_action);
            preparedStatement.setString(6, se_label);                        
            preparedStatement.setString(7, se_property);
            preparedStatement.addBatch(); 

            if(++count % batchSize == 0){
                preparedStatement.executeBatch();
                System.out.println("batch execution!");

            }               
        }
        System.out.println("out of loop");
        preparedStatement.executeBatch();
        preparedStatement.close();
        conn.commit();
        conn.close();   
Kumar Vivek
  • 11
  • 1
  • 5
  • http://stackoverflow.com/questions/6892105/bulk-insert-in-java-using-prepared-statements-batch-update – Pradeep Sep 24 '16 at 13:38
  • I have used bach insertion @Pradeep – Kumar Vivek Sep 24 '16 at 13:38
  • Use BULK INSERT - it is designed for exactly what you are asking and significantly increases the speed of inserts. Also, (just in case you really do have no indexes) you may also want to consider adding an indexes - some indexes (most an index one on the primary key) may improve the performance of inserts. The actual rate at which you should be able to insert records will depend on the exact data, the table structure and also on the hardware / configuration of the SQL server itself, so I can't really give you any numbers. – Pradeep Sep 24 '16 at 13:49
  • https://msdn.microsoft.com/en-us/library/ms188365.aspx – Pradeep Sep 24 '16 at 13:50
  • Can you do your manipulations in SQL? If so, you can insert into your target table the results of the select query directly. If doing manipulation in SQL is not an option, write the results of your manipulated data into a file in S3 and use the redshift COPY command. See http://docs.aws.amazon.com/redshift/latest/dg/t_Loading-data-from-S3.html and http://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-bulk-inserts.html – DotThoughts Sep 26 '16 at 18:41

1 Answers1

0

I have had the same problem, taking too long to insert data from one redshift table to another redshift table (I used node.js). Initially it took me around 18 mins to insert 1 million records.

I figured the data in my table was not sorted according to the sort key(timestamp). It is imperative that your data is sorted according to sort key and use that sort key in your where predicate (If you have a where predicate).
Run vacuum table to 100 percent
to sort your data. After your manipulations make sure you order by your data according to your sort key.

After doing this, I was able to achieve unexpected results. 1 million record inserts in 3 secs.

Rahul Gupta
  • 1,744
  • 2
  • 17
  • 28