19

I work on a system which downloads data from a cloud system to a local database (PostgreSQL, MySQL, ...). Now I'm having an issue with PostgreSQL performance because it takes a lot of time to insert the data.

A number of columns and the size of the data may vary. In a sample project, I have a table with approx. 170 columns. There is one unique index - but even after dropping the index the speed of the insert did not change.

I'm using JDBC driver to connect to the database and I'm inserting data in batches of 250 rows (using NamedParameterJdbcTemplate).

It took me approx. 18 seconds to insert the data on Postgres. The same data set on MySQL took me just a second. That's a huge difference - where does it come from? Is Postgres JDBC driver that slow? Can it be configured somehow to make it faster? Am I missing something else? The difference between Postgres and MySQL is so huge. Any other ideas how to make it faster?

I made a sample project which is available on Github - https://github.com/varad/postgresql-vs-mysql. Everything happens in LetsGo class in the "run" method.

varad
  • 1,606
  • 1
  • 14
  • 18
  • 1
    could you enable more debug to see what is actual slow? (insert, commit, connecting)? for debug in java (for driver) loglevel=2 (https://jdbc.postgresql.org/documentation/80/connect.html) for server-side logging https://www.drupal.org/node/560192 – Bartosz Bilicki Jun 21 '16 at 10:37
  • Which versions are we talking about? Did you try with with a transaction? – m0skit0 Jun 21 '16 at 10:43
  • Also, did you try reversing and making it insert on MySQL first and see what happens (like `letsGo.run(Type.MYSQL); letsGo.run(Type.POSTGRES);`)? Also how are you checking the times? – m0skit0 Jun 21 '16 at 10:46
  • There are at least two problems to investigate on: batch update and number of columns. You should modify your test to isolate which one could be the problem. – Aris2World Jun 21 '16 at 10:48
  • I'm currently on PostgreSQL 9.3.3 and JDBC driver 9.2-1003-jdbc4. I tried even a newer version of the driver without any success. – varad Jun 21 '16 at 10:53
  • The order of "letsGo.run(Type.MYSQL); letsGo.run(Type.POSTGRES);" does not make any difference. – varad Jun 21 '16 at 10:53
  • 1
    would you please do this: `create table t1 as select * from your_table limit 250;` and then `pg_dump --inserts -t t2` to a file and then try running the file in `psql` measuring time (`\timing on` switch) - this will give you the expected speed of `insert` of 250 rows on your machine. then create index etc.. and measure again. – Vao Tsun Jun 21 '16 at 11:27
  • I enabled the "loglevel=2" and here's what I got: https://github.com/varad/postgresql-vs-mysql/blob/234e20be520f1950c0f74820ec4e220506fe684a/logs/loglevel2.log. (Or big full log: https://github.com/varad/postgresql-vs-mysql/blob/master/logs/loglevel2_full_log.zip) I removed a lot of similar lines and added a ">>>>" comment to the log. My understanding of the output is - it is parsing the INSERT statement a lot of times... – varad Jun 21 '16 at 11:40
  • Vao Tsun: I exported the data and then imported using `psql`... and it completes in just 1,5 seconds. So I guess the issue lies in the JDBC driver. – varad Jun 21 '16 at 12:10
  • I believe you could have autocommit enabled - so it commits every iNSERT statement separately. If it forces buffer flush to the disk on each commit - it could be the delay. please edit 250 inserts with `begin;` `commit;` wrapped - it will show you time you have it will be it. sorry for such back engineering, but I think in this case it is fastest way of diagnosing – Vao Tsun Jun 21 '16 at 12:25
  • can you set `conn.setAutoCommit(false);` ?.. – Vao Tsun Jun 21 '16 at 12:29
  • Unfortunately, that did not help either. (I set that explicitly by calling setAutoCommit method, though I believe it is disabled by default). – varad Jun 21 '16 at 13:21
  • Ensure that you have full primary keys and appropriate indices defined on the table you're trying to insert into. You could also try partitioning your data being inserted into chunks and run it on a number of threads on separate machines, if you have access to that. – ManoDestra Jun 23 '16 at 18:38
  • I know I already mentioned, but just in case - what happens when you comment 95 line?.. https://github.com/varad/postgresql-vs-mysql/blob/master/src/main/java/LetsGo.java#L95 – Vao Tsun Jun 23 '16 at 21:12
  • What storage engine are you using for MySQL? If it's MyISAM, it will be faster. And also non-ACID-compliant. See http://stackoverflow.com/questions/277440/mysql-myisam-vs-inno-db It's a bit old, but still does a good job explaining why you don't want to use MyISAM. – Andrew Henle Jun 24 '16 at 10:28
  • @AndrewHenle: that might be the case but 18 seconds for only 250 rows is way too much in Postgres. –  Jun 24 '16 at 10:36
  • @a_horse_with_no_name The question says "batches of 250 rows". I don't see any reference to the total number of rows inserted. I'm assuming that means each transaction is 250 rows. – Andrew Henle Jun 24 '16 at 10:37
  • @AndrewHenle: but the final row count in the table after running the program is 250. –  Jun 24 '16 at 10:39
  • @a_horse_with_no_name If the total is 250 rows, I'd say 1 second is too much, too. It'd be useful to know the hardware being used along with the actual IO patterns both databases use while doing the insert. – Andrew Henle Jun 24 '16 at 10:40
  • 1
    When enabling statement logging, I can see that the statement is prepared approx. 20.000 (yes 20 **thousand**) times before the actual inserts start. Once the actual inserts start, that only takes approx. 1 second. But the 20.000 prepare statement take ~30 seconds on my computer. I don't know if this is caused by Spring or some configuration option. –  Jun 24 '16 at 10:41
  • @a_horse_with_no_name you can disable the prepared statements with the useServerPrepStmts parameter in the connection string. See my answer. – stepanian Jun 24 '16 at 10:57
  • On MySQL I use InnoDB. In the app on GitHub I only insert the total of 250 records. In the real application, I download 250 rows from the cloud, insert them, download next 250 and insert them, etc. The number of records may vary. – varad Jun 24 '16 at 11:11
  • "prepared approx. 20.000 (yes 20 thousand)" - good point. I can't think of any reason right now for that behavior. I would kinda understand if was the 250 times... but 20.000?! – varad Jun 24 '16 at 11:13

3 Answers3

23

It seems that this is a combination of a Spring "bug" and a driver "bug".

Spring tries to determine the data type of a column each time setValue() is called. It does this by calling PreparedStatementMetaData.getParameterMetaData()

This apparently causes a "prepare" statement to be sent to the database which in itself is quite fast (never more then 1ms on my laptop) but as it is called for each column for each row this sums up to a lot of time (it's called for every not-null value which results in approx. 23.000 calls)

To some extent this is more a Spring bug then a driver bug because not caching the parameter meta data doesn't really make sense (at least in my opinion). The MySQL JDBC driver does not support getParameterMetaData() and Spring knows this and so this "bug" doesn't show up with MySQL because spring never calls that method.

I am not sure if Postgres' JDBC driver behavior can be classified as a bug, but it sure would be nice if the driver was caching that meta data after the first call.

Spring can be convinced to not obtain the statement meta data through the property spring.jdbc.getParameterType.ignore

So by putting:

System.setProperty("spring.jdbc.getParameterType.ignore", "true");

before the line:

LetsGo letsGo = new LetsGo();

this behaviour is disabled.

The property must be set before Spring is initialized.

When I do that with your sample project, the insert runs in 500ms on my laptop.


Edit

After seeing the comment regarding the use of the Postgres-NG driver I dug into the sources of the "official" driver and the NG driver, and the NG driver does cache the parameter meta data after the first call whereas the official driver does not which explains why using the NG driver is so much faster (without disabling the call in Spring)

  • Thanks! I believe this is a perfect answer. Though if I set the property I get "DataIntegrityViolationException" not sure why. Did you have to change something else except for adding the property? – varad Jun 24 '16 at 12:48
  • @varad: no, that was the only thing I changed. –  Jun 24 '16 at 12:48
  • Interesting, I'm getting "22P02: invalid input syntax for integer". But I guess that's some other kind of issue. – varad Jun 24 '16 at 12:50
  • I figured that out. The code which was parsing my XML and converting the values to Java types was setting an Integer to Text field and hence it failed. So it's not related to this Stack Overflow question at all. The good new is that with that parameter enabled it is really fast! Thanks for investigating the issue, it was really helpful! – varad Jun 24 '16 at 12:57
2

try using pgjdbc-ng driver and then compare your results.

It is available here: http://impossibl.github.io/pgjdbc-ng/

Varshney P.
  • 208
  • 1
  • 12
-3

I hope you are using DB Connection Pool. You can try C3P0. Spring (JDBCTemplate) doesn't provide Connection Pool implementation.

user2253863
  • 1
  • 1
  • 1