5

I am trying to fetch data from remote table. The data is expanded from seed set of data in local table using recursive CTE. The query is very slow (300 seed rows to 800 final rows takes 7 minutes).

For other "tiny local, huge remote"-cases with no recursive query the DRIVING_SITE hint works excellently. I also tried to export seed set from local table into auxiliary table on remotedb with same structure and - being logged in remotedb - ran query as pure local query (my_table as p, my_table_seed_copy as i). It took 4s, which encouraged me to believe forcing query to remote site would make query fast.

What's the correct way to force Oracle to execute recursive query on the remote site?

with s (id, data) as (
  select p.id, p.data
  from my_table@remotedb p
  where p.id in (select i.id from my_table i)
  union all
  select p.id, p.data
  from s
  join my_table@remotedb p on ...
)
select /*+DRIVING_SITE(p)*/ s.*
from s;

In the query above, I tried

  • select /*+DRIVING_SITE(p)*/ s.* in main select
  • select /*+DRIVING_SITE(s)*/ s.* in main select
  • omitting DRIVING_SITE in whole query
  • select /*+DRIVING_SITE(x)*/ s.* from s, dual@remotedb x as main select
  • select /*+DRIVING_SITE(p)*/ p.id, p.data in first inner select
  • select /*+DRIVING_SITE(p)*/ p.id, p.data in both inner selects
  • select /*+DRIVING_SITE(p) MATERIALIZE*/ p.id, p.data in both inner selects
  • (just for completeness - rewriting to connect by is not applicable for this case - actually the query is more complex and uses constructs which cannot be expressed by connect by)

All without success (i.e. data returned after 7 minutes).

Tomáš Záluský
  • 10,735
  • 2
  • 36
  • 64
  • In the query you posted, `p` isn't in scope for the hint. Does it make any difference if you put that hint *inside* the CTE - possibly in both branches? Or, perhaps, that plus a hint to materialize the CTE? (Just speculating... no real insight, and can't test.) – Alex Poole Mar 06 '21 at 17:59
  • @AlexPoole thanks, your advice is very interesting and it didn't occur to me. Unfortunately, neither works. I will update question with these attempts. – Tomáš Záluský Mar 06 '21 at 19:38
  • Try creating a view at the remote db with the cte, – gsalem Mar 06 '21 at 20:09
  • @gsalem thanks, currently I don't have clear idea how to incorporate seed set into the view. The anchor part of CTE should somehow reflect the seed set, otherwise the whole table becomes seed set. Parameterized view would help in this case but Oracle does not support it. I am currently trying to implement remote pipelined function. – Tomáš Záluský Mar 06 '21 at 20:20
  • Is your remote table huge and you consume very little part of it in the recursion? Maybe it is better to pull the data locally and do recursion? – astentx Mar 06 '21 at 20:27
  • @astentx remote table has 1.5 million rows, almost 200 columns, occupies 1.5GB. Local table is actually the docker container which extracts small subset from remote table. The part of data consumed by recursion is substantially smaller than whole table. Reported sample is one of the large samples, query - if run locally - is fast enough for such sample size. I guess pulling data locally would take even more time than running slow query, anyway thanks for your point. – Tomáš Záluský Mar 06 '21 at 20:56
  • Maybe your query can be rewritten with Oracle's standard [hierarchical query](https://docs.oracle.com/en/database/oracle/oracle-database/12.2/sqlrf/Hierarchical-Queries.html)? I think Oracle doesn't push the recursion to the remote server because this `with` refers to itself and is defined only locally (and cannot be merged to be presented as the "flat" query). Or pipelined function can help here to return the resulted hierarchical data – astentx Mar 06 '21 at 23:07
  • @astentx unfortunately it is practically impossible in this case. The CTE is stronger than `connect by` clause, in actual query I am utilizing window function over data from one recursion level. Yes, the `connect by` is useful otherwise, I use it on two other places in project and it is fast. I tend to agree with your statement about pushing recursion to the remote server. – Tomáš Záluský Mar 06 '21 at 23:22
  • One thing to see if the table in the remote DB is slowing the performance is to create a table by doing CTAS for the remote table, and then use this new table in your query, if your timing decreases then it could be the remote table is slowing the query. – Himanshu Kandpal Mar 07 '21 at 02:00
  • @hkandpal I did it vice versa - instead of copying large data set by doing CTAS for the remote table I copied small data set from local table to remote db (see second paragraph of question for more detail). I think it proved the db link is slowing query. It was much simpler than copying large table do local. Do you think it is worth trying anyway? Why? – Tomáš Záluský Mar 08 '21 at 11:19

1 Answers1

1

Recursive query actually performs breadth-first search - seed rows represent 0-th level and recursive part finds element on n-th level from elements on (n-1)-th level. Original query was intended to be part of merge ... using ... clause.

Hence I rewrote query to PLSQL loop. Every cycle generates one level. Merge prevents insertion of duplicates so finally no new row is added and loop exits (transitive closure is constructed). Pseudocode:

loop
  merge into my_table using (
    select /*+DRIVING_SITE(r)*/ distinct r.* /*###BULKCOLLECT###*/
    from my_table          l
    join my_table@remotedb r on ...  -- same condition as s and p in original question are joined on
  ) ...
  exit when rows_inserted = 0;
end loop;

Actual code is not so simple since DRIVING_SITE actually does not directly work with merge so we have to transfer data via work collection but that's different story. Also the count of inserted rows cannot be easily determined, it must be computed as difference between row count after and before merge.

The solution is not ideal. Anyway it's much faster than recursive CTE (30s, 13 cycles) because queries are provably utilizing the DRIVING_SITE hint.

I will leave question open for some time to wait if somebody finds answer how to make recursive query working or proving it is not possible.

APC
  • 144,005
  • 19
  • 170
  • 281
Tomáš Záluský
  • 10,735
  • 2
  • 36
  • 64