1

I'm interested in getting and doing some processing on all the entities A returned by a query of the form:

 SELECT * FROM A a WHERE a.id not in (select b.id from B)

Where A is a "complex" entity in the sense that it inherits (InheritanceTyped.Joined) from other entities and that several of its attributes are other entities (@OneToOne and @ManyToOne).

The query itself takes a few minutes to yield results hence my desire to execute it as few as possible.

Here are the different approaches i tried to get those A elements as efficiently as possible :

  1. Pagination using setFirstResult/ setMaxResults Do the job, but pretty slowly as the query seems to be executed everytime.(around 50 elements processed/sec)
  2. Getting IDs first, A objects next Keeping all the IDs in memory is doable, so I execute once

    SELECT a.id FROM A a WHERE a.id not in (select b.id from B)
    

and then select a from A a WHERE a.id= :id, which goes relatively fast as the id column is indexed. This is currently the solution that is the most efficient with (around 100 elements processed/sec)

  1. Using ScollableResults I had high hope with this solution, but it ended up being slower than other alternatives, leaving me at around 20 elements processed/sec ...

As a neophyte, I don't know what other options to investigate, or if I did something wrong in any of my attempts.

Hence my questions:

  1. Are there (factually) other approaches to efficiently tackle this kind of problem ?
  2. Is it normal that ScrollableResults performed so poorly ? Is there something I should have paid attention to while implementing this solution?

EDIT: Here's the execution plan execution plan

Loïc Gammaitoni
  • 4,173
  • 16
  • 39
  • Your current query won't even run on Oracle, so maybe you should fix your question first. Saying `WHERE a.id not in (select * from B)` is nonsensical, because `SELECT *` in general would return more than one column (unless table `B` coincidentally happen to have just one column). – Tim Biegeleisen Feb 19 '19 at 07:48
  • ok, I changed the question accordingly. – Loïc Gammaitoni Feb 19 '19 at 07:53
  • 1
    *"getting and doing some processing on all the entities A"* - what sort of processing? – APC Feb 19 '19 at 08:14
  • 1
    It is hard to say if your `A` and `B` are joins of several tables each. But the simple cursor processing should be the efficientest one, because you get rid of the `ORM` overhead and the *N +1* problem. Some care must be taken though, you problem with cursor can be as simple as that the DB is optimized to get `FIRST ROW` so you do not wait on the first row but the throughput is low. Without seeing the real SQL and the [execution plan](https://stackoverflow.com/a/34975420/4808122) it is hard to say anything. – Marmite Bomber Feb 19 '19 at 08:29
  • @APC processing : calculations and insert in another table. – Loïc Gammaitoni Feb 19 '19 at 08:57
  • @MarmiteBomber: That was my thoughts as well from what I read online... I updated the question with the execution plan – Loïc Gammaitoni Feb 19 '19 at 08:57
  • The most efficient way of doing calculations and insertions into another table is to keep the data in the database. Nothing will perform better than a pure SQL `insert into ... select ... from` set operation. The bigger the tables the greater the impact of not doing it in the database. – APC Feb 19 '19 at 11:39
  • This won't do as what I insert into the DB is all processed data, and the processing being quite complex, I can't do it in SQL. I don't have too much problem with the time the insertion take. It's the time taken by the select that gives me trouble – Loïc Gammaitoni Feb 19 '19 at 11:54
  • Try converting your query to hibernate named query. Did you check the performance with Not Exists? – Atul Feb 19 '19 at 16:45

0 Answers0