0

Source & target tables are similar. Source-tMap-Target

Target table has a UUID field that is computed in tMap, however the flow should not insert duplicate persons in target i.e unique (firstname,lastname,dob,gender). I tried marking those columns as key in tMap as in below screenshot, but that does not prevent duplicate inserts. How can I avoid duplicate inserts without adding unique constraint on target?

tMap - setting keys on equivalent unique constraint columns

I also tried "using field" in target.

Field options

Edit: Solution as suggested below: enter image description here

Erik Philips
  • 53,428
  • 11
  • 128
  • 150
Espresso
  • 5,378
  • 4
  • 35
  • 66
  • 2
    Have you tried `tUniqRow`? – Kermit Feb 18 '16 at 23:54
  • 1
    you would have to handle this on your own..which means you have to read from target table too in your job and then do a lookup/left outer join to this target table to decide which row from source needs to go to insert and which to update... – garpitmzn Feb 19 '16 at 00:56
  • What do you want to happen in case of a duplicate? Should it do nothing, or update the duplicate record? – Brian R Armstrong Feb 19 '16 at 18:23

1 Answers1

1

The CDC components in the Paid version of Talend Studio for Data Integration undoubtedly address this.

In Open Studio, you'll can roll your own Change data capture based on the composite, unique key (firstname,lastname,dob,gender).

  • Use tUniqueRow on data coming from stage_geno_patients, unique on the following columns: firstname,lastname,dob,gender
  • Feed that into a tMap
  • Add another query as input to the tMap, to perform look-ups against the table behind "patients_test", to find a match on the firstname,lastname,dob,gender. That lookup should "Reload for each row" using looking up against values from the staging row
  • In the case of no-match, detect it and then do an insert of the staging row of data into the table behind "patients_test"

Q: Are you going to update information, also? Or, is the goal only to perform unique inserts where the data is not already present?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
dbh
  • 1,607
  • 13
  • 18