I'm performing an experiment where I want to see what approach to inserting multiple records into a PostgreSQL database is going to give me the best speed performance. I'm using "PostgreSQL 9.5.3, compiled by Visual C++ build 1800, 64-bit" + Java 1.8 + Postgres JDBC Driver 9.4.1207.
Table definition for this experiment is as follows:
CREATE TABLE mytable (ea VARCHAR(256) NOT NULL, ehv CHAR(60) NOT NULL);
ALTER TABLE mytable ADD CONSTRAINT pk_mytable_ea PRIMARY KEY (ea);
The first approach is to perform executeBatch
where I create list of parameter maps. The code looks like this:
public void executeBatch(List<MyObject> myObjects) {
final String sql = "INSERT INTO mytable(" +
" ea, ehv) " +
" VALUES (:ea, :ehv) ";
MapSqlParameterSource[] params = new MapSqlParameterSource[myObjects.size()];
for (int i = 0; i < myObjects.size(); i++) {
params[i] = new MapSqlParameterSource()
.addValue("ea", myObjects.get(i).getEa())
.addValue("ehv", myObjects.get(i).getEhv());
}
jdbcTemplate.batchUpdate(sql, params);
}
The second approach is a bit more exotic and it's making using of Jackson
for serializing objects to jsonb
and then using jsonb_to_recordset
I'm expanding this JSON
string to row of records, which I then insert into mytable
.
public void insertFromString(List<MyObject> myObjects) throws RepositoryException {
final String sql = "INSERT INTO mytable(ea, ehv)\n" +
"select ea, ehv\n" +
"from jsonb_to_recordset(:myObjectsJson) as x(ea text, ehv text)\n" +
"ON CONFLICT DO NOTHING";
ObjectMapper jsonObjectMapper = new ObjectMapper();
String myObjectsJson = null;
try {
myObjectsJson = jsonObjectMapper.writeValueAsString(myObjects);
} catch (JsonProcessingException ex) {
throw new RepositoryException("Error while transforming myObjects to json.", ex);
}
PGobject jsonObject = new PGobject();
jsonObject.setType("jsonb");
try {
jsonObject.setValue(myObjectsJson);
} catch (SQLException ex) {
throw new RepositoryException("Error while building jsonb object.", ex);
}
jdbcTemplate.update(sql, new MapSqlParameterSource().addValue("myObjectsJson", jsonObject));
}
I've compared these two approaches using the following test where I create 10,000 MyObjects.
@Test
public void test() throws JsonProcessingException, RepositoryException {
List<MyObject> myObjects = new ArrayList(10000);
for(int i = 0; i < 10000; i++) {
myObjects.add(new MyObject(i + "ea", i + "ehv"));
}
long before = System.currentTimeMillis();
censusRepository.executeBatch(myObjects);
long after = System.currentTimeMillis();
System.out.println("executeBatch Took - " + (after-before));
namedParameterJdbcTemplate.update("delete from mytable", new MapSqlParameterSource());
before = System.currentTimeMillis();
censusRepository.insertFromString(myObjects);
after = System.currentTimeMillis();
System.out.println("insertFromString Took - " + (after-before));
namedParameterJdbcTemplate.update("delete from mytable", new MapSqlParameterSource());
}
Running the test I see the second approach is performing much better:
executeBatch Took - 464ms
insertFromString Took - 253ms
What is causing the second approach to be performing much better than executeBatch
? From your experience which approach should I take. Could you suggest another one, which could give me even better performance? I have the following constraints - I will be inserting batches of 1000 MyObjects into the database and each MyObject contains two String values.