3

I'm performing an experiment where I want to see what approach to inserting multiple records into a PostgreSQL database is going to give me the best speed performance. I'm using "PostgreSQL 9.5.3, compiled by Visual C++ build 1800, 64-bit" + Java 1.8 + Postgres JDBC Driver 9.4.1207.

Table definition for this experiment is as follows:

CREATE TABLE mytable (ea VARCHAR(256) NOT NULL, ehv CHAR(60) NOT NULL);

ALTER TABLE mytable ADD CONSTRAINT pk_mytable_ea PRIMARY KEY (ea);

The first approach is to perform executeBatch where I create list of parameter maps. The code looks like this:

public void executeBatch(List<MyObject> myObjects) {
        final String sql = "INSERT INTO mytable(" +
                "    ea, ehv) " +
                "    VALUES (:ea, :ehv) ";

        MapSqlParameterSource[] params = new MapSqlParameterSource[myObjects.size()];
        for (int i = 0; i < myObjects.size(); i++) {
            params[i] = new MapSqlParameterSource()
                    .addValue("ea", myObjects.get(i).getEa())
                    .addValue("ehv", myObjects.get(i).getEhv());
        }

        jdbcTemplate.batchUpdate(sql, params);
    }

The second approach is a bit more exotic and it's making using of Jackson for serializing objects to jsonb and then using jsonb_to_recordset I'm expanding this JSON string to row of records, which I then insert into mytable.

public void insertFromString(List<MyObject> myObjects) throws RepositoryException {
        final String sql = "INSERT INTO mytable(ea, ehv)\n" +
                "select ea, ehv\n" +
                "from jsonb_to_recordset(:myObjectsJson) as x(ea text, ehv text)\n" +
                "ON CONFLICT DO NOTHING";

        ObjectMapper jsonObjectMapper = new ObjectMapper();
        String myObjectsJson = null;
        try {
            myObjectsJson = jsonObjectMapper.writeValueAsString(myObjects);
        } catch (JsonProcessingException ex) {
            throw new RepositoryException("Error while transforming myObjects to json.", ex);
        }

        PGobject jsonObject = new PGobject();
        jsonObject.setType("jsonb");
        try {
            jsonObject.setValue(myObjectsJson);
        } catch (SQLException ex) {
            throw new RepositoryException("Error while building jsonb object.", ex);
        }

        jdbcTemplate.update(sql, new MapSqlParameterSource().addValue("myObjectsJson", jsonObject));
    }

I've compared these two approaches using the following test where I create 10,000 MyObjects.

@Test
    public void test() throws JsonProcessingException, RepositoryException {
        List<MyObject> myObjects = new ArrayList(10000);
        for(int i = 0; i < 10000; i++) {
            myObjects.add(new MyObject(i + "ea", i + "ehv"));
        }

        long before = System.currentTimeMillis();
        censusRepository.executeBatch(myObjects);
        long after = System.currentTimeMillis();
        System.out.println("executeBatch Took - " + (after-before));

        namedParameterJdbcTemplate.update("delete from mytable", new MapSqlParameterSource());

        before = System.currentTimeMillis();
        censusRepository.insertFromString(myObjects);
        after = System.currentTimeMillis();
        System.out.println("insertFromString Took - " + (after-before));

        namedParameterJdbcTemplate.update("delete from mytable", new MapSqlParameterSource());
    }

Running the test I see the second approach is performing much better:

executeBatch Took - 464ms
insertFromString Took - 253ms

What is causing the second approach to be performing much better than executeBatch? From your experience which approach should I take. Could you suggest another one, which could give me even better performance? I have the following constraints - I will be inserting batches of 1000 MyObjects into the database and each MyObject contains two String values.

Anton Belev
  • 11,963
  • 22
  • 70
  • 111
  • 3
    [`COPY ...`](https://www.postgresql.org/docs/current/static/sql-copy.html) < `INSERT INTO ... SELECT` < `INSERT INTO ... VALUES (...), (...) ...` < N × `INSERT INTO ... VALUES (...)`. Almost always. Executing multiple statements adds communication, preparation, etc. costs *per statement*. Binding several parameters to a single statement also. – pozs Mar 16 '17 at 16:34
  • See f.ex. [this (quite old, but still relevant) question](http://stackoverflow.com/questions/8134602/psycopg2-insert-multiple-rows-with-one-query). – pozs Mar 16 '17 at 16:36

0 Answers0