0

We have two data sources that We need to consolidate into one. We need to keep the most recent rows. The sources have the same columns. The key is Machine column.

Input :

tFileInputDelimited A
+---------+------+-------------------+
| Machine | Desc | LastScanTimestamp |
+---------+------+-------------------+
| M01     | AA   |                25 |
| M02     | AB   |                23 |
| M03     | AC   |                28 |
+---------+------+-------------------+

tFileInputDelimited B
+---------+------+-------------------+
| Machine | Desc | LastScanTimestamp |
+---------+------+-------------------+
| M02     | BB   |                25 |
| M03     | BC   |                27 |
| M04     | BD   |                26 |
+---------+------+-------------------+

Output wanted :

+---------+------+-------------------+
| Machine | Desc | LastScanTimestamp |
+---------+------+-------------------+
| M01     | AA   |                25 |
| M02     | BB   |                25 |
| M03     | AC   |                28 |
| M04     | BD   |                26 |
+---------+------+-------------------+

Our question is similar to this SQL question: SQL query to get most recent row for each instance of a given key

We can't use a SQL query to do it. We found a way to do it with tUnite, tAggregateRow and tMap. This is not really elegant and maintainable.

Thanks a lot.

Community
  • 1
  • 1
maxrous
  • 1
  • 2

1 Answers1

0

I found an anwser on Talend Forum : https://www.talendforge.org/forum/viewtopic.php?id=49655

tInputFileDelimited--->tUnite--->tSortRow--->tUniqRow--->tLogRow
                     /                           
tInputFileDelimited-/

tSortRow : Sort by LastScanTimestamp desc

tUniqRow : Key = Machine

maxrous
  • 1
  • 2