0

We need to improve performance of the reports obtained from the PostgreSQL view. It takes around 2-5 minutes for the select to happen for the 10k rows and this is the most time consuming part of the report generation. They are running on the same view, Hibernate is used to connect to db, query is executed as a SQL query

SQLQuery query = session.createSQLQuery(query);
List<Object> data = query.list();

Three queries for the reports are similar with many columns selected:

SELECT amount, rate, channel, date ...etc 
FROM view 
WHERE channel = 'name'
  AND invoice_id = '1' 
  AND date < 12 Oct
  • Is it worth to run those selects in parallel or performance of those calls may degrade even more with a larger dataset?

  • I read that a different connections has to be used for the selects in parallel. Can somebody advise how this can be achieved with Hibernate?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
ddzz
  • 259
  • 1
  • 3
  • 10
  • You didnt show us what are the queries. But depend on the queries parallel can help you reduce the time. https://codereview.stackexchange.com/questions/155253/splitting-a-list-of-numbers-into-multiple-lists – Juan Carlos Oropeza Nov 13 '19 at 19:16
  • @JuanCarlosOropeza here they are – ddzz Nov 13 '19 at 19:25
  • Still not clear what are the queries on parallel. But if for example you will run the same query but with different `invoice_id` then parallel would help you. From the PostgreSQL side you will get responses faster. Now I cant advise about hibernate because I just use C# . – Juan Carlos Oropeza Nov 13 '19 at 19:29
  • 1
    btw 5 min for a 10k table is very slow. Did you define index for your table? – Juan Carlos Oropeza Nov 13 '19 at 19:30
  • 1
    You should run an `EXPLAIN` on your queries like Juan mentions. Also consider an upgrade to PG. PG 9 is quite dated and newer versions have a lot more options and safety baked in https://why-upgrade.depesz.com/show?from=9.1&to=12&keywords= – Freiheit Nov 13 '19 at 19:48
  • Note that "Postgres 9" covers 7 **major** version (where the last one introduced parallel query) –  Nov 13 '19 at 20:28

1 Answers1

3

This depends but in principle running more queries is rarely the solution. There are at least few steps one should perform before going that route:

  1. Run EXPLAIN on the query and try to optimize the bottleneck, maybe there is a full table scan that can be avoided?
  2. Ensure that the 3 similar queries share the same execution plan. Your code doesn't use the bind parameters and this would imply that you have 3 different execution plans which is most likely wasteful.
  3. Configure correct fetch size in Hibernate (hibernate.jdbc.fetch_size property) and JDBC driver. With 10,000 rows and the usual fetch size of 10 rows the driver would still do 1,000 round trips to the database. If your database latency is 100 ms this will amount to 100 seconds on just network round trips.
  4. Optimize Hibernate session for reporting queries by using read-only session.
  5. Optimize Hibernate query, maybe you don't need to load the entities and it's enough to just get few columns with a projection.

To understand where is the bottleneck and what to optimize try profiling the SQL query in the database server. You want to understand how much time it takes to:

  1. Send SQL query from the client
  2. Parse SQL query into an execution plan
  3. Execute the plan to produce rows
  4. Send the rows back to the client

A good design approach would be to setup a read-only replica of your database for just the reporting. This allows to split the OLTP processing from OLAP processing and optimize both sides separately which is much easier.

Karol Dowbecki
  • 43,645
  • 9
  • 78
  • 111
  • Thank you for the answer, sorry I haven't mentioned that a query runs as an SQL query so N + 1 is not a case. – ddzz Nov 13 '19 at 19:52