3

I need to make a lookup between two tables T1(A,B,C) and T2(A,B,C,D,E) on column C to get all column B values that are matching :

T1 :

enter image description here

T2 :

enter image description here

When I choose Full cache Mode I get only the first matching row (I'm only interested by column B values): 12122 but I need to get also 12123 and 12124 because C matches also with these rows.

I've tried to use Partial and no cache modes by using custom query with inner join (which returns all needed rows when executing the query in SSMS) but doesn't return all rows and it's killing performence.

I've tried also the solution proposed here :

How to get unmatched data between two sources in SSIS Data Flow?

And it gives the same results as lookup plus, I need to redirect unmatched rows to new table.

MedEc
  • 169
  • 1
  • 14

2 Answers2

1

I don't think the cache mode will affect your result, and it is performance based. The ultimate explanation is:

•If there are multiple matches in the reference table, the Lookup transformation returns only the first match returned by the lookup query. If multiple matches are found, the Lookup transformation generates an error or warning only when the transformation has been configured to load all the reference dataset into the cache. In this case, the Lookup transformation generates a warning when the transformation detects multiple matches as the transformation fills the cache.

To get the matched B from T2, you can just use the SQL in OLD DB source(command), for example:

SELECT distinct T2.B 
FROM T1 as A 
INNER JOIN T2 as B
ON B.C = A.C
LONG
  • 4,490
  • 2
  • 17
  • 35
  • Thanks for your return, Did you mean Full cache mode ? yes, is good for performence but doesn't meet my need which is to fetch all matched rows. And I need lookup to can redirect unmatched rows to table T3 and this is impossible for the first load when table T2 is empty. – MedEc Aug 29 '19 at 14:19
  • why do you use `Lookup` when you have the reference table empty? and where are`Matched` and `Unmatched` rows coming from? – LONG Aug 29 '19 at 14:27
  • Rows are coming from table T1, first load the reference table T2 is empty so all unmatched rows will be inserted in reference table T2 and T3. Then, i will add a surrogate key based on columns B and C but I need to get all values of column B with the same value of C. Please let me know if it's not clear. – MedEc Aug 29 '19 at 14:51
  • `all unmatched rows will be inserted in reference table T2 and T3.` where did you get the `unmatched rows`? – LONG Aug 29 '19 at 15:02
  • From Table T1 which is my source table – MedEc Aug 29 '19 at 15:03
  • Compared to which table? – LONG Aug 29 '19 at 15:04
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/198671/discussion-between-long-and-medec). – LONG Aug 29 '19 at 15:05
-1

If LONG's answer does not address your needs, you'd need to write a Script Transformation, operating in asynchronous mode (1 row of input can yield 0 to many output rows)

If the source data/T1 wouldn't contain duplicate C values, then the pre-execute phase of the component could cache the result of column B & C from T2 into local memory. Then for each source row that flows through, you'd need to loop through the results and append the B values into the data flow.

This gets trickier if T1 can have duplicate-ish data as you'd need to be querying the target table for each row that flows through - but you'd also have to track the B/C values that have already rolled through as you might need to reference those Bs as well.

You can also evaluate a Merge Join as I think that allows multiple rows to be emitted but I'm guessing you'll have more control over performance with a script transformation.

Either way, when you pull T2 table in, write a custom query and only select the columns you need (B&C).

billinkc
  • 59,250
  • 9
  • 102
  • 159