1

I am extending this question Spring Batch Performance issue when run two independent jobs sequentially? further here:

We're seeing behavior of Postgres v9.6 where we're trying to save the data having null values using Batch updates taking a very long time to save the data.

Is there any way if we can fixed it either from database side or from Spring Boot side ?

Sample Query -

"INSERT INTO test.ACCT "
    + "(rec_type, acct_type_cd, src_acct_id, stat_cd,stat_dttm, ........, "
    + "..................) "
    + "VALUES(?, ? , ?, )";

Note - For security reasons can't add column name as is.

We've the table details and query like below

  • has a large proportion of NULLs in several columns
  • receives a large number of UPDATEs or DELETEs regularly
  • is not growing rapidly
  • has no indexes on it.
  • not uses any triggers that may be executing database functions, or is calling functions directly

Total no. of rows we're inserting in batches are 40K and in the chunk of 2500 records. Code List<Map<String, Object>>

 batchValues = new ArrayList<>(items.size());
for(Employee emp: items) {
    batchValues.add(new MapSqlParameterSource()
            .addValue("", emp.getXXXXX() == null ? "": emp.getXXXXX(), JDBCType.VARCHAR.getVendorTypeNumber())
            .addValue("", emp.getXXXXX() == null ? "": emp.getXXXXX(), JDBCType.VARCHAR.getVendorTypeNumber())
            .addValue("", emp.getXXXXX() == null ? "": emp.getXXXXX() , JDBCType.VARCHAR.getVendorTypeNumber())
            .addValue("", emp.getXXXXX() == null ? "": emp.getXXXXX(), JDBCType.VARCHAR.getVendorTypeNumber())
            .addValue("", emp.getXXXXX(), JDBCType.DATE.getVendorTypeNumber())
            .addValue("", emp.getXXXXX()== null ? "": emp.getXXXXX(), JDBCType.VARCHAR.getVendorTypeNumber())
            .addValue("", emp.getXXXXX() == null ? "": emp.getXXXXX(), JDBCType.VARCHAR.getVendorTypeNumber())
            .addValue("", emp.getXXXXX() == null ? "": emp.getXXXXX(), JDBCType.VARCHAR.getVendorTypeNumber())
            ........
            ........
            ........
            ........
            .getValues());
    
}

try {

    int[] updateCounts = namedJdbcTemplate.batchUpdate(SQL, batchValues.toArray(new Map[items.size()]));
} catch (Exception e) {
    log.error("Error occurred in BatchUpdate ##");
    throw new GenericException(e.getMessage(),this.getClass().getSimpleName()); 
}

batch Jobs are running sequential, first is truncate (which is fast) and other insert in batches (having more null values) eating up the performance.

Jeff Cook
  • 7,956
  • 36
  • 115
  • 186
  • 1
    Postgres 9.3 has reached EOL in 2018. See: https://www.postgresql.org/support/versioning/ Consider a current version of Postgres. Aside from that, please present a proper question. Consider instructions here: https://stackoverflow.com/tags/postgresql-performance/info – Erwin Brandstetter Aug 25 '20 at 21:13
  • @ErwinBrandstetter - Inserting null values from Spring Batch code is taking a lot of time and eating up the performance. Is there any setting that we need to do at postgres to set null values ? – Jeff Cook Aug 26 '20 at 07:22
  • 1
    Are you running the UPDATEs and DELETEs concurrently? Is it possible that they are blocking each other? Did you try to run those batched UPDATEs using native JDBC to rule out problems with Spring Data (and possibly JPA). Doing UPDATEs in "chunks" does usually not improve performance in Postgres. A single commit at the end of the transaction is very often the fastest way and 40k isn't a particular large transaction to begin with. –  Aug 26 '20 at 07:41
  • @a_horse_with_no_name- No its not concurrent, on daily basis we're first truncating the data and then loading the fresh data (this is business needs as data is raw for business) not blocking each other. Yes I am using JdbcTemplate and NamedParameterJdbcTemplate. Nope even keeping 500 doesn't improve much performance. We've preferred to use chunk based programming so we wanted to commit chunk wise for numerous reasons – Jeff Cook Aug 26 '20 at 08:14
  • Please don't ask [the same question](https://stackoverflow.com/questions/63572701/spring-batch-performance-issue-when-run-two-independent-jobs-sequentially) again!. Instead modify/improve your original one. – M. Deinum Aug 26 '20 at 08:21
  • How long does it actually take? On my laptop I can insert 50k rows (1.5kb each) using plain JDBC with single-row inserts in about 6 seconds. With batched inserts, this goes down to about 2.5 seconds. Using the Copy API this goes down to about 1.5 seconds –  Aug 26 '20 at 08:37
  • @a_horse_with_no_name - To insert around 1800 records using CompositeItemWriter to write at 3 different tables its taking around 20 mins. Have you tried inserting more NULL values for more columns ? I am pretty sure Postgres is taking time while inserting null records. I am inserting values for 10 columns out of which 5 columns has null values. – Jeff Cook Aug 26 '20 at 08:57
  • It actually gets faster with many NULL values - which makes sense, as less data needs to be sent to the database and needs to be written. So this is most likely some kind of Spring problem –  Aug 26 '20 at 09:00
  • Your problem sounds suspiciously like this: https://stackoverflow.com/questions/37942063/slow-insert-on-postgresql-using-jdbc –  Aug 26 '20 at 09:02
  • @a_horse_with_no_name - If I add space ("") instead of null then its takes 55 seconds for me and I add null values it takes 15-20 mins. I am not sure why Postgres is not liking the NULL values and causing issues? – Jeff Cook Aug 26 '20 at 09:02
  • Again: it's not Postgres, it's Spring –  Aug 26 '20 at 09:03
  • @a_horse_with_no_name - Ah yes, but what fix I would need to put at Spring end ? – Jeff Cook Aug 26 '20 at 09:04

1 Answers1

0

Appreciate for the help from M. Deinum and a_horse_with_no_name, always provide great help.

As suggested here Slow insert on PostgreSQL using JDBC and Section 5.3.2 - https://docs.spring.io/spring/docs/current/spring-framework-reference/data-access.html#jdbc-batch-list.

In such a scenario, with automatic setting of values on an underlying PreparedStatement, the corresponding JDBC type for each value needs to be derived from the given Java type. While this usually works well, there is a potential for issues (for example, with Map-contained null values). Spring, by default, calls ParameterMetaData.getParameterType in such a case, which can be expensive with your JDBC driver. You should use a recent driver version and consider setting the spring.jdbc.getParameterType.ignore property to true (as a JVM system property or in a spring.properties file in the root of your classpath) if you encounter a performance issue — for example, as reported on Oracle 12c (SPR-16139).

Alternatively, you might consider specifying the corresponding JDBC types explicitly, either through a 'BatchPreparedStatementSetter' (as shown earlier), through an explicit type array given to a 'List<Object[]>' based call, through 'registerSqlType' calls on a custom 'MapSqlParameterSource' instance, or through a 'BeanPropertySqlParameterSource' that derives the SQL type from the Java-declared property type even for a null value.

I've created spring.properties and added spring.jdbc.getParameterType.ignore=true solved my issues its now taking just 7-10 seconds to load 1800 records to 3 different tables 10 columns having 5 columns as NULL value.

Jeff Cook
  • 7,956
  • 36
  • 115
  • 186