PDI - Check data types of field

Question

I'm trying to create a transformation read csv files and check data types for each field in that csv.

Like this : the standard field A should string(1) character and field B is integer/number.

And what I want is to check/validate: If A not string(1) then set Status = Not Valid also if B not a integer/number to. Then all file with status Not Valid will be moved to error folder.

I know I can use Data Validator to do it, but how to move the file with that status? I can't find any step to do it.

Possible duplicate of [PDI - Read CSV Files, if missing field/data then move to the next file](https://stackoverflow.com/questions/51393492/pdi-read-csv-files-if-missing-field-data-then-move-to-the-next-file) — AlainD, Jul 30 '18 at 14:15
However, the current answer is more complete that the answer to th original duplicate. — AlainD, Jul 31 '18 at 09:04

score 1 · Answer 1 · answered Aug 01 '18 at 12:02

1

You can read files in loop, and

add step as below,

after data validation, you can filter rows with the negative result(not matched) -> add constant values step and with error = 1 -> add set variable step for error field with default values 0.

after transformation finishes, you can do add simple evaluation step in parent job to check value of ERROR variable.

If it has value 1 then move files else ....

I hope this can help.

answered Aug 01 '18 at 12:02

Niraj

106
7

If i do like your suggestion it mean I should run a job that there a transformation and move file, right? How about If I run transformation that have a job to move file? Is it possible? – Rio Odestila Aug 03 '18 at 02:37
Or using script to move file? – Rio Odestila Aug 03 '18 at 07:43
You can use move file step of the job itself. – Niraj Aug 03 '18 at 08:03
But i dont understand how to pass parameter/variable to the job. I have idea to use script or run job in a transformation. but still stuck with pass the value of that parameter/variable. – Rio Odestila Aug 03 '18 at 08:06
can you share your issue in pictorial format, please? – Niraj Aug 06 '18 at 10:37

AlainD · Accepted Answer · 2018-07-31T09:02:36.773

You can do same as in this question. Once read use the Group by to have one flag per file. However, this time you cannot do it in one transform, you should use a job.

Your use case is in the samples that was shipped with your PDI distribution. The sample is in the folder your-PDI/samples/jobs/run_all. Open the Run all sample transformations.kjb and replace the Filter 2 of the Get Files - Get all transformations.ktr by your logic which includes a Group by to have one status per file and not one status per row.

In case you wonder why you need such a complex logic for such a task, remember that the PDI starts all the steps of a transformation at the same time. That's its great power, but you do not know if you have to move the file before every row has been processed.

Alternatively, you have the quick and dirty solution of your similar question. Change the filter row by a type check, and the final Synchronize after merge by a Process File/Move

And a final advice: instead of checking the type with a Data validator, which is a good solution in itself, you may use a Javascript like there. It is more flexible if you need maintenance on the long run.

how `Group by` can set flag for each file? I know from previous question that I can used `Group by` or `Memory group by` but i dont know how to implement it? I need step by step to understand — Rio Odestila, Jul 31 '18 at 01:55
It would really help me to understand if u can give me sample of your solution in transformation form (.ktr file) — Rio Odestila, Jul 31 '18 at 02:14

PDI - Check data types of field

2 Answers2