Optimization when merging from Oracle datalink

Question

I am trying to write an Oracle procedure to merge data from a remote datalink into a local table. Individually the pieces work quickly, but together they time out. Here is a simplified version of what I am trying.

What works:

Select distinct ProjectID from Project where LastUpdated < (sysdate - 6/24);

--Works in split second.

    Merge into project 
     using (select /*+DRIVING_SITE(remoteCompData)*/ 
            rp.projectID,
            rp.otherdata
            FROM Them.Remote_Data@DBLink rd
            WHERE rd.projectID in (1,2,3)) sourceData -- hardcoded IDs
    On (rd.projectID = project.projectID)

When matched...

-- Merge statement works quickly when the IDs are hard coded

What doesn't work: Combining the two statements above.

    Merge into project 
     using (select /*+DRIVING_SITE(rd)*/ -- driving site helps when this piece is extracted from the larger statement
            rp.projectID,
            rp.otherdata
            FROM Them.Remote_Data@DBLink rd
            WHERE rd.projectID in --in statement that works quickly by itself.
               (Select distinct ProjectID from Project where LastUpdated < (sysdate - 6/24)) 
-- This select in the in clause one returns 10 rows. Its a test database.
    On (rd.projectID = project.projectID)
    )

When matched...

-- When I run this statement in SQL Developer, this is all that I get without the data updating Connecting to the database local. Process exited. Disconnecting from the database local.

I also tried pulling out the in statement into a with statement hoping it would execute differently, but it had no effect.

Any direction for paths to pursue would be appreciated. Thanks.

Please read this post about [asking Oracle tuning questions on SO](https://stackoverflow.com/a/34975420/146325) — APC, Aug 13 '20 at 17:21
`Select distinct ProjectID from Project where LastUpdated < (sysdate - 6/24)` how many rows does this subquery return? — Sayan Malakshinov, Aug 13 '20 at 17:31
These are both test databases. The query that you ask about only returns 4 rows in my test scenario. The data on the other side has millions of rows. I am guessing that it is pulling all of those over before filtering on this query. — MattInVail, Aug 13 '20 at 18:27

score 1 · Answer 1 · answered Aug 13 '20 at 17:20

The /*+DRIVING_SITE(rd)*/ hint doesn't work with MERGE because the operation must run in the database where the merged table sits. Which in this case is the local database. That means the whole result set from the remote table is pulled across the database link and then filtered against the data from the local table.

So, discard the hint. I also suggest you convert the IN clause into a join:

Merge into project p
using (select rp.projectID,
              rp.otherdata
       FROM Project ld
            inner join Them.Remote_Data@DBLink rd
                     on rd.projectID = ld.projectID
      where ld.LastUpdated < (sysdate - 6/24)) q
-- This select in the in clause one returns 10 rows. Its a test database.
    On (q.projectID = p.projectID)
    )

Please bear in mind that answers to performance tuning questions without sufficient detail are just guesses.

My understanding was that the Driving_site hint only applied the subquery where it is in the select clause and this would happen before the merge, in which case it would help. If this is not true, then it seems like I need to find some way to make it happen. I only need a few dozen records from the millions in the remote view. If I move to a join, I still have the same issue which seems to be the heart of the problem. — MattInVail, Aug 13 '20 at 18:08
I rewrote it as a join with the same results. The select statement inside of using takes 0.8 seconds with the hint and 35 seconds without it when run outside of the merge statement. It never seems to complete at all inside the merge statement. — MattInVail, Aug 13 '20 at 18:38

score 0 · Answer 2 · answered Mar 03 '21 at 07:26

I found your question having same problem. Yes, the hint in query is ignored when the query is included into using clause of merge command.

In my case I created work table, say w_remote_data for your example, and splitted merge command into two commands: (1) fill the work table, (2) invoke merge command using work table.

The pitfall is, we cannot simply use neither of commands create w_remote_data as select /*+DRIVING_SITE(rd)*/ ... or insert into w_remote_data select /*+DRIVING_SITE(rd)*/ ... to fill the work table. Both of these commands are valid but they are slow - the hint does not apply too so we would not get rid of the problem. The solution is in PLSQL: collect result of query in using clause using intermediate collection. See example (for simplicity I assume w_remote_data has same structure as remote_data, otherwise we have to define custom record instead of %rowtype):

declare
  type ct is table of w_remote_data%rowtype;
  c ct;
  i pls_integer;
begin
  execute immediate 'truncate table w_remote_data';
  select /*+DRIVING_SITE(rd)*/ *
  bulk collect into c
  from Them.Remote_Data@DBLink rd ...;
  if c.count > 0 then
    forall i in c.first..c.last
      insert into w_remote_data values c(i);
  end if;
  merge into project p using (select * from w_remote_data) ...;
  execute immediate 'truncate table w_remote_data';
end;

My case was ETL script where I could rely it won't run in parallel. Otherwise we would have to cope with temporary (session-private) tables, I didn't try if it works with them.

Optimization when merging from Oracle datalink

2 Answers2

Linked