2

I need to query a database and copy the resultset into another database, which has the same database structure and is also on the same host system.

The following JAVA-function works pretty well (fast and without errors), if the query result is pretty small:

public void copyTableData(Connection dbConnOnSrcDB, Connection dbConnOnDestDB,
                          String sqlQueryOnSrcDB, String tableNameOnDestDB) 
  throws SQLException {

    try (

        PreparedStatement prepSqlStatmOnSrcDB = dbConnOnSrcDB.prepareStatement(sqlQueryOnSrcDB);

        ResultSet sqlResultsFromSrcDB = prepSqlStatmOnSrcDB.executeQuery()
    ) {
        ResultSetMetaData sqlMetaResults = sqlResultsFromSrcDB.getMetaData();

        // Stores the query results
        List<String> columnsOfQuery = new ArrayList<>();

        // Store query results
        for (int i = 1; i <= sqlMetaResults.getColumnCount(); i++)
            columnsOfQuery.add(sqlMetaResults.getColumnName(i));

        try (
            PreparedStatement prepSqlStatmOnDestDB = dbConnOnDestDB.prepareStatement(
                "INSERT INTO " + tableNameOnDestDB +
                     " (" + columnsOfQuery.stream().collect(Collectors.joining(", ")) + ") " +
                        "VALUES (" + columnsOfQuery.stream().map(c -> "?").collect(Collectors.joining(", ")) + ")")
        ) {

            while (sqlResultsFromSrcDB.next()) {
                for (int i = 1; i <= sqlMetaResults.getColumnCount(); i++)
                    prepSqlStatmOnDestDB.setObject(i, sqlResultsFromSrcDB.getObject(i));

                prepSqlStatmOnDestDB.addBatch();
            }
            prepSqlStatmOnDestDB.executeBatch();
        }
    }
}

But I have very large database queries and resultsets in the range of several hundred megabytes.

Problem A: I found out that the below OutOfMemoryError is raising, when the second line of code is processed:

ResultSet sqlResultsFromSrcDB = prepSqlStatmOnSrcDB.executeQuery()

JAVA-Exeption:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.Class.getDeclaredFields0(Native Method)
at java.lang.Class.privateGetDeclaredFields(Class.java:2583)
at java.lang.Class.getDeclaredField(Class.java:2068)
at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl$1.run(AtomicReferenceFieldUpdater.java:323)
at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl$1.run(AtomicReferenceFieldUpdater.java:321)
at java.security.AccessController.doPrivileged(Native Method)
at java.util.concurrent.atomic.AtomicReferenceFieldUpdater$AtomicReferenceFieldUpdaterImpl.<init>(AtomicReferenceFieldUpdater.java:320)
at java.util.concurrent.atomic.AtomicReferenceFieldUpdater.newUpdater(AtomicReferenceFieldUpdater.java:110)
at java.sql.SQLException.<clinit>(SQLException.java:372)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2156)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:300)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:428)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:354)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:169)
at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:117)
at Application.copyTableData(Application.java:159)
at Application.main(Application.java:585)

Problem B: The copy job needs really much time. Is there a way to speed up the copy process?

The DB queries are:

String[] tables = new String[]{
                    "table1",
                    "table1_properties",
                    "table1_addresses",
                    "table2",
                    "table3",
                    "table4",
                    "table5",
                    "table6",
                    "table7",
                    "table8",
                    "table9",
                    "table10"
            };

Function call:

for( String table : tables ){

  getDataFromSrcDB = "SELECT " + table + ".* " +
    "FROM table1 " +
        "FULL JOIN table1_properties " +
            "ON table1_properties.d_id=table1.d_id " +
        "FULL JOIN table1_addresses " +
            "ON table1_addresses.d_id=table1_properties.d_id " +
        "FULL JOIN table2 " +
            "ON table2.p_id=table1_properties.p_id " +
        "FULL JOIN table3 " +
            "ON table3.d_id=table1.d_id " +
        "FULL JOIN table4 " +
            "ON table4.d_id=table1.d_id " +
        "FULL JOIN table5 " +
            "ON table5.d_id=table1.d_id " +
        "FULL JOIN table6 " +
            "ON table6.d_id=table1.d_id " +
        "FULL JOIN table7 " +
            "ON table7.d_id=table1.d_id " +
        "FULL JOIN table8 " +
            "ON table8.id=table4.id " +
        "FULL JOIN table9 " +
            "ON table9.d_id=table1.d_id " +
        "FULL JOIN table10 " +
            "ON table10.a_id=table1_addresses.a_id " +
        "WHERE ST_Intersects(ST_MakeEnvelope(" +
               minLong + "," +
               minLat + "," +
               maxLong + "," +
               maxLat + ",4326), geom :: GEOMETRY) OR " +
        "ST_Intersects(ST_MakeEnvelope(" +
               minLong + "," +
               minLat + "," +
               maxLong + "," +
               maxLat + ",4326), CAST(table3.location AS GEOMETRY))";

   copyTableData(dbConnOnSrcDB, dbConnOnDestDB, getDataFromSrcDB, table);
}
Peter
  • 157
  • 4
  • 11
  • which RDBMS you are using? – Youcef LAIDANI May 07 '18 at 19:54
  • I use PostgreSQL version 9.6.4 – Peter May 07 '18 at 20:19
  • I don’t know about Postgres, but in SQL Server you can a make it a database-only batch operation with insert/select. – Nick.Mc May 07 '18 at 22:40
  • By default, the Postgres JDBC driver loads the entire resultset into memory. See [the manual](https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor) for details on how to avoid that. –  May 08 '18 at 09:25

3 Answers3

4

When the size of the batch is huge, you get this error :

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

I have some solution.

First Solution

Instead you can divide the batch for small batches for example each 1_000 persist the data, you need some configuration also, as Mark Rotteveel mention in the comment, and as the documentation mention Getting results based on a cursor :

By default the driver collects all the results for the query at once. This can be inconvenient for large data sets so the JDBC driver provides a means of basing a ResultSet on a database cursor and only fetching a small number of rows.

So what you should to do :

  • The connection to the server must be using the V3 protocol.
  • The Connection must not be in autocommit mode.
  • The query given must be a single statement
  • The fetch size of the Statement is needed to the appropriate size
  • ..read the details in the documentation

in this case your code can be like this :

//Note here you set auto commit for the source connection
dbConnOnSrcDB.setAutoCommit(false);

final int batchSize = 1000;
final int fetchSize = 50;
int count = 0;
...
//Set the appropriate size for the FetchSize
sqlResultsFromSrcDB.setFetchSize(fetchSize);
while (sqlResultsFromSrcDB.next()) {
    for (int i = 1; i <= sqlMetaResults.getColumnCount(); i++) {
        prepSqlStatmOnDestDB.setObject(i, sqlResultsFromSrcDB.getObject(i));
    }
    prepSqlStatmOnDestDB.addBatch();
    if (++count % batchSize == 0) {
        prepSqlStatmOnDestDB.executeBatch();
    }
}
prepSqlStatmOnDestDB.executeBatch(); // insert remaining records

Second Solution

Because you are using PostgreSQL I would like to use dblink to transfer data between database to another database.


Some usefull links :

Youcef LAIDANI
  • 55,661
  • 15
  • 90
  • 140
  • I found out that this OutOfMemoryError is raising, when the second line of code is processed: ResultSet sqlResultsFromSrcDB = prepSqlStatmOnSrcDB.executeQuery() – Peter May 07 '18 at 20:15
  • @pier I don't understand the problem with that line, did you tried my solutions? – Youcef LAIDANI May 07 '18 at 20:21
  • I tried your first solution, but the OutOfMemoryError is raising again when ResultSet sqlResultsFromSrcDB = prepSqlStatmOnSrcDB.executeQuery() is processed. So the problem is in this line of code... – Peter May 07 '18 at 20:24
  • @pier woww Now I understand the problem, your query is really complicated, the query is the reason then, I would like to focus on this query and minimize it, you can ask another question about minimizing this query. sorry I'm not so expert in this. – Youcef LAIDANI May 07 '18 at 20:42
  • IIRC, the PostgreSQL driver will load all rows in memory in auto-commit mode. Disabling auto-commit may help in this case. – Mark Rotteveel May 09 '18 at 16:16
  • @MarkRotteveel did you mean `dbConnOnSrcDB.setAutoCommit(false);` ? – Youcef LAIDANI May 09 '18 at 16:31
  • Yes, and I also see you'll need to explicitly set a fetch size, see also https://jdbc.postgresql.org/documentation/head/query.html#query-with-cursor Combined with your suggestion to execute the batch in increments, that should make it work. – Mark Rotteveel May 09 '18 at 16:32
  • Thank you @MarkRotteveel I edit my answer, can you put your Touch if I miss something please? – Youcef LAIDANI May 09 '18 at 16:53
  • Looks good, although I'd suggest to also explicitly say that setting a fetch size is needed. – Mark Rotteveel May 09 '18 at 16:57
  • Thank you @MarkRotteveel I always learn a lot from you I appreciate it – Youcef LAIDANI May 09 '18 at 17:02
0

You have many ways to achieve it. Here are few options you can apply-

  1. Read Data from First database and write it into csv file and then read again from csv file in chunks and write to another database.(Easy to implement but more coding) https://gauravmutreja.wordpress.com/2011/10/13/exporting-your-database-to-csv-file-in-java/

  2. If you don't have much data manipulation before data transfer to another database, you can write a simple DB function to read data from one Database and write into another one.

  3. or you can try Spring batch to do this.
Avhi
  • 806
  • 2
  • 15
  • 29
0

When you fetch data directly it is fillin ram until all datas fetched. So you can easly see OutOfMemoryError error.

If you fetch data with stream you can capture unlimited data, because of stream fetching, and processing and contiue with new datas, clearing ram for processed datas (as parted with fetchSize)

utrucceh
  • 1,076
  • 6
  • 11