0

I will try to explain the problem on an abstract level first:

I have X amount of data as input, which is always going to have a field DATE. Before, the dates that came as input (after some process) where put in a table as output. Now, I am asked to put both the input dates and any date between the minimun date received and one year from that moment. If there was originally no input for some day between this two dates, all fields must come with 0, or equivalent.

Example. I have two inputs. One with '18/03/2017' and other with '18/03/2018'. I now need to create output data for all the missing dates between '18/03/2017' and '18/04/2017'. So, output '19/03/2017' with every field to 0, and the same for the 20th and 21st and so on.

I know to do this programmatically, but on powercenter I do not. I've been told to do the following (which I have done, but I would like to know of a better method):

Get the minimun date, day0. Then, with an aggregator, create 365 fields, each has that "day0"+1, day0+2, and so on, to create an artificial year.

After that we do several transformations like sorting the dates, union between them, to get the data ready for a joiner. The idea of the joiner is to do an Full Outer Join between the original data, and the data that is going to have all fields to 0 and that we got from the previous aggregator.

Then a router picks with one of its groups the data that had actual dates (and fields without nulls) and other group where all fields are null, and then said fields are given a 0 to finally be written to a table.

I am wondering how can this be achieved by, for starters, removing the need to add 365 days to a date. If I were to do this same process for 10 years intead of one, the task gets ridicolous really quick.

I was wondering about an XOR type of operation, or some other function that would cut the number of steps that need to be done for what I (maybe wrongly) feel is a simple task. Currently I now need 5 steps just to know which dates are missing between two dates, a minimun and one year from that point.

I have tried to be as clear as posible but if I failed at any point please let me know!

monkey intern
  • 705
  • 3
  • 14
  • 34

3 Answers3

1

Im not sure what the aggregator is supposed to do? The same with the 'full outer' join? A normal join on a constant port is fine :) c

Can you calculate the needed number of 'dublicates' before the 'joiner'? In that case a lookup configured to return 'all rows' and a less-than-or-equal predicate can help make the mapping much more readable.

In any case You will need a helper table (or file) with a sequence of numbers between 1 and the number of potential dublicates (or more) I use our time-dimension in the warehouse, which have one row per day from 1753-01-01 and 200000 next days, and a primary integer column with values from 1 and up ...

Lars G Olsen
  • 1,093
  • 8
  • 11
  • Okay, that last idea seems really smart to me, so thanks for that, might use it at some point. As for the can I calculate the number of duplicates before the joiner. That is part of what I'm missing how. I am going to get a number of inputs. But before the join, I don't know what I am missing. If it can be done, with powercenter, I don't know how. In a programming language I would create a data structure, a simple array even, put the inputs in the arrays position, and whatever is empty is what I am missing. Here, not sure at all :S – monkey intern Apr 21 '17 at 06:59
  • I Think you'll need to read the source twice: first time, just to identify the size of each gap, the second time to join the result of the first 'pass' to the incoming data and generate the dublicates. – Lars G Olsen Apr 21 '17 at 17:37
1

You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation

Daniel Machet
  • 615
  • 1
  • 5
  • 7
  • Right. As far as I know, it is forbidden within the company to do java transformations. By policy, it is not accepted. Pity to learn that that is trully the best option :(( – monkey intern Apr 21 '17 at 06:57
0

You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation

Ok... so you could override your source qualifier to achieve this in the selection query itself (am giving Oracle based example as its what I'm used to and I'm assuming your data in is from a table). I looked up the connect syntax here

SQL to generate a list of numbers from 1 to 100

SELECT (MIN(tablea.DATEFIELD) + levquery.n - 1) AS Port1 FROM tablea, (SELECT LEVEL n FROM DUAL CONNECT BY LEVEL <= 365) as levquery

(Check if the query works for you - haven't access to pc to test it at the minute)

Community
  • 1
  • 1
Daniel Machet
  • 615
  • 1
  • 5
  • 7