3

Here's the basic idea of what I want to do in SSIS:

I have a large query against a production Oracle database, and I need the following where clause that brings in a long list of ids from SQL Server. From there, the results are sent elsewhere.

select ... 
from Oracle_table(s) --multi-join
where id in ([select distinct id from SQL_SERVER_table])

Alternatively, I could write the query this way:

select ...
from Oracle_table(s) --multi-join
...
join SQL_SERVER_table sst on sst.ID = Oracle_table.ID

Here are my limitations:

  • The Oracle query is large and cannot be run without the where id in (... clause
    • This means I cannot run the Oracle query, then join it against the ids in another step. I tried this, and the DBA's killed the temp table after it became 3 TB in size.
  • I have 160k id's
    • This means it is not practical to iterate through the id's one by one. In the past, I have run against ~1000 IDs, using a comma-separated list. It runs relatively fast - a few minutes.
  • The main query is in Oracle, but the ids are in SQL Server
  • I do not have the ability to write to Oracle

I've found many questions like this.
None of the answers I have found have a solution to my limitations.

Similar question:

Hadi
  • 36,233
  • 13
  • 65
  • 124
ColinMac
  • 620
  • 2
  • 9
  • 18
  • What exactly are these Ids? 160k x 4 bytes (for int) doesn't seem close to 3TB. – AlwaysLearning Aug 19 '21 at 01:44
  • The ID's are alpha-numeric account numbers. They're not large. The query against Oracle is pulling in 50+ columns, 20+ million rows of data. – ColinMac Aug 19 '21 at 15:21
  • Are you allowed to use the Oracle equivalent of [linked servers](https://dba.stackexchange.com/questions/163075/what-is-oracle-equivalent-for-linked-server-and-can-you-join-with-sql-server)? You could then create a view on the Oracle server that queries the linked server for the IDs. – SMM Aug 24 '21 at 03:05
  • @SMM, I don't have write-access in the Oracle table. – ColinMac Aug 24 '21 at 18:51
  • @ColinMac, how many columns are you retrieving from the Oracle table ? – Roberto Hernandez Aug 25 '21 at 06:12
  • @RobertoHernandez: I think 50+ columns. It's a large fact database query (Oracle DB), restricted by my list of ID's from SQL SERVER. – ColinMac Aug 25 '21 at 18:53
  • 1
    Assuming the ID's are not contiguous values, how about paging the Id's coming from SQL into batches of say 1000 and looping ~160 times using `in ()` to pull all of the rows? – allmhuran Aug 26 '21 at 01:05
  • Alternatively, if the Id's are "sometimes contiguous" you could calculate contiguous batches of id's in sql, and then loop query against oracle using a `between` style construct. – allmhuran Aug 26 '21 at 01:26
  • 1
    @allmhuran: We created an object from the SQL SERVER ID's, and iterated through them. It's very cumbersome with a lot of slow overhead steps for SSIS. Not ideal, but it worked in the end. Well, it actually died half way through and finished the remaining ID's on second run. – ColinMac Aug 26 '21 at 01:48
  • Yeah, one at a time from a sql database is always going to be painful. Batch iteration with `in ()` or `between` on calculated contiguous ranges (depending on what your id spread looks like) is definitely going to be faster, but if you made it, you made it! :) – allmhuran Aug 26 '21 at 02:11

1 Answers1

3

To prevent loading all rows from the Oracle table. The only way is to apply the filter in the Oracle database engine. I don't think this can be achieved using SSIS since you have more than 160000 ids in the SQL Server table, which cannot be efficiently loaded and passed to the Oracle SQL command:

  • Using Lookups and Merge Join will require loading all data from the Oracle database
  • Retrieving data from SQL Server, building a comma-separated string, and passing it to the Oracle SQL command cannot be done with too many IDs (160K).
  • The same issue using a Script Task.
  • Creating a Linked Server in SQL Server and Joining both tables will load all data from the Oracle database.

To solve your problem, you should search for a way to create a link to the SQL Server database from the Oracle engine.

Oracle Heterogenous Services

I don't have much experience in Oracle databases. Still, after a small research, I found something in Oracle equivalent to "Linked Servers" in SQL Server called "heterogeneous connectivity".

The query syntax should look like this:

select * 
from Oracle_table
where id in (select distinct id from SQL_SERVER_table@sqlserverdsn)

You can refer to the following step-by-step guides to read more on how to connect to SQL Server tables from Oracle:

Importing Data from SQL Server to a staging table in Oracle

Another approach is to use a Data Flow Task that imports IDs from SQL Server to a staging table in Oracle. Then use the staging table in your Oracle query. It would be better to create an index on the staging table. (If you do not have permission to write to the Oracle database, try to get permission to a separate staging database.)

Example of exporting data from SQL Server to Oracle:

Minimizing the data load from the Oracle table

If none of the solutions above solves your issue. You can try minimizing the data loaded from the Oracle database as much as possible.

As an example, you can try to get the Minimum and Maximum IDs from the SQL Server table, store both values within two variables. Then, you can use both variables in the SQL Command that loads the data from the Oracle table, like the following:

SELECT * FROM Oracle_Table WHERE ID > @MinID and ID < @MaxID

This will remove a bunch of useless data in your operation. In case your ID column is a string, you can use other measures to filter data, such as the string length, the first character.

Hadi
  • 36,233
  • 13
  • 65
  • 124