I think I am facing a common problem when moving data between two servers but I can't find a good solution. I have two servers: server1
and server2
. I want to move a subset of a table in server1
to a table in server2
. I have the following tables (source and sink tables have the same schema):
server1.dbo.source
- Contains ID + other columns.server2.dbo.sink
- Contains ID + other columns (same schema assource
table).server2.dbo.utilizedIds
- Contains one column with list of IDs I would like to move.
If I could reference both servers in the same query, the data that I am trying to move from server1
to server2
could be described by this SQL statement:
SELECT * FROM server1.dbo.source
WHERE server1.dbo.source.id IN (SELECT * FROM server2.dbo.utilizedIds)
What I am trying to achieve is to only move IDs
that I need. These IDs
are dynamic and stored in a table in my destination server. I am doing this because I need a really small subset of the source table (I need thousands of rows out of billions). So far, these are the solutions that I have found and the reasons why I can't use them or I wouldn't like to use them:
- Copy all the data into a temp table in
server2
and perform the filtering there - Moving this much data every time I need to update my sink is not feasible. - Perform a delta insert of the source table into
server2
and then filter this table - While this would be more performant than option 1, I really don't want to incur the cost of storing billions of rows when I only need thousands. - Create temp tables/persistent tables in
server1
and insert theIDs
into this table, then filter using standard SQL - I am not the owner ofserver1
, I cannot create tables or stored procedures. The temp table option was promising but temp tables don't persist across activities. - Create a dynamic query and pass the
IDs
as a string to theSELECT
statement. Something like this:SELECT * FROM server1.dbo.source WHERE id IN ('ID1', 'ID2', 'ID3' ...)
- So far, this is my best solution. However, SQL statements have a length and parameter limit, and this is not a good practice
To me, this seems like a common use case. What is the best practice or correct solution to this problem? Building my query using strings feels like a hack and not something ADF was designed for.