How does indirect load in informatica work internally. Does it collate all the data and then process the data or it does processing for one file at a time? If I have duplicates spanning multiple files, will the duplicate removal logic in my mapping would remove duplicates or would I have to merge the files using Union transformation and then process the data in the duplication removal logic?
Asked
Active
Viewed 882 times
3 Answers
1
As far as I know, Informatica would process the data as if it were a single file. So yes it should remove the duplicates across files

Samik
- 3,435
- 2
- 22
- 27
1
Informatica reads a stream as if it was a single file. It's like you'd do a cat
on filename with wildcard, eg. if there are two files f1.txt
with a testlineA
inside and f2.txt
with a testlineB
inside, and you run a cat f*.txt
command, you should get:
testlineA
testlineB
Just like if it was coming from one file.

Maciejg
- 3,088
- 1
- 17
- 30
-
1Correct, please note that the filenames of the individual files are available if you enable a special port. Quite useful if you add the file name to the target DB for added traceability – Lars G Olsen Mar 27 '17 at 20:57
-1
So long as your pipeline has an active transformation (i.e. sorter) before you actually filter out the duplicates then all records will have arrived at the active transformation before moving onto the filter and the matter will be moot

Daniel Machet
- 615
- 1
- 5
- 7