3

I am migrating person from a view to my database with the following query, which is too slow:

SELECT DISTINCT PI.VALUE ID
FROM PERSON_VIEW PERV
    inner join PERSON_IDENT PI on PI.VALUE = PERV.IDENTITY
    inner join PERSON PER on PER.ID = PI.EXTERNAL_ID
WHERE NOT EXISTS (SELECT RECORD_ID FROM PERSON_MIGR_DATA PMD
                  WHERE RECORD_ID = PERV.RECORD_ID)
  AND NOT EXISTS (SELECT RECORD_ID FROM PERSON_MIGR_ERRORS
                  WHERE RECORD_ID = PERV.RECORD_ID)

There are indexes on joined fields, but I must change this query, as it lasts 15minutes.. Tables are quite big.

So. I have PERSON_VIEW - from which I read person to migrate, PERSON_MIGR_DATA - id of migrated persons, and PERSON_MIGR_ERRORS - id of records with person thath didn't pass validation during migration.

I think, that the NOT EXIST is the bottleneck. But how could make this query so that it would be faster ?

jarlh
  • 42,561
  • 8
  • 45
  • 63
michealAtmi
  • 1,012
  • 2
  • 14
  • 36
  • 1
    Have you tried making the link to person_migr_data a left join and then checking for fields where the respective field is NULL like `LEFT JOIN PERSON_MIGR_DATA PMD ON PMD.RECORD_ID = PERV.RECORD_ID` and then putting `WHERE PMD.RECORD_ID IS NULL` in the WHERE? With the same principle for the other NOT EXISTS – Magisch Jul 04 '18 at 07:54
  • 1
    "Distinct" - oh my, oh my... He is one of the "bottleneck" – Ychdziu Jul 04 '18 at 07:57
  • @Ychdziu: I don't follow... If the OP is pulling a value that may not be distinct... – Paul Jul 04 '18 at 08:02
  • 1
    @Paul: when people use "Distinct" - it often indicates, that people doesn't understand the data, they are querying. The data is taken from the view - there is a select statement in it - take from there, what you need, that you don't have to use "Distinct". – Ychdziu Jul 04 '18 at 08:06
  • 1
    @Ychdziu: Err... That's a big assumption, indicating that you believe the OP doesn't know his own data. The problem here, is that *we* don't know *his* data... – Paul Jul 04 '18 at 08:08
  • 1
    @Ychdziu Failing additional details from OP, we don't know if it's actually a view or not or other constraints OP might have with his data retrieval. Maybe the original data is just poorly deduplicated and normalized. The distinct keyword in and of itself might be necessary in the case at hand. Directly going to assuming OP is incompetent is somewhat unkind. – Magisch Jul 04 '18 at 08:08
  • 1
    Have you tried the query as a `NOT IN` table join? Do you get any better performance from this? (Even taking [this post](https://stackoverflow.com/questions/173041/not-in-vs-not-exists) into consideration. – Paul Jul 04 '18 at 08:11
  • 1
    @Paul,Magisch : I don't deny that, its all in assumptions. OP stated, that its a view, and he wants to improve hes statement - so one of the option is to dig deeper in the view, and take from there whats is important. Work for that data! Lol. – Ychdziu Jul 04 '18 at 08:12
  • Thank u guys, JOIN is a way faster!! – michealAtmi Jul 04 '18 at 08:33

1 Answers1

7

You coul try using a pair of left join for not matching (instead of not exist)

SELECT DISTINCT PI.VALUE ID
FROM PERSON_VIEW PERV
inner join PERSON_IDENT PI on PI.VALUE = PERV.IDENTITY
inner join PERSON PER on PER.ID = PI.EXTERNAL_ID
LEFT JOIN PERSON_MIGR_DATA PMD ON  PMD.RECORD_ID = PERV.RECORD_ID
LEFT JOIN PERSON_MIGR_ERRORS PME ON  PME.RECORD_ID = PERV.RECORD_ID
where PMD.RECORD_ID is null and PME.RECORD_ID is null 
ScaisEdge
  • 131,976
  • 10
  • 91
  • 107