3

I have SPSSmodeler stream which is now used and updated every week constantly to generate a certain dataset. A raw data for this stream is also renewed on a weekly basis.

In part of this stream, there is a chunk of nodes that were necessary to modify and update manually every week, and the sequence of this part is below: Type Node => Restructure Node => Aggregate Node

To simplify the explanation of those nodes' role, I drew an image of them as bellow.

enter image description here

Because the original raw data is changed weekly basis, the range of Unit value above is always varied, sometimes more than 6 (maybe 100) others less than 6 (maybe 3). That is why somebody has to modify there and update those chunk of nodes on a weekly basis until now. *Unit value has a certain limitation (300 for now)

However, now we are aiming to run this stream automatically without touching any human operations on it that we need to customize there to work perfectly, automatically. Please help and will appreciate your efforts, thanks!

htshame
  • 6,599
  • 5
  • 36
  • 56
Toshi
  • 45
  • 7
  • Is there a node that takes a dataset, and gives you the maximum value of a given column? Alternatively, just set the count of "Unit n" columns to 300 – Caleth Oct 10 '18 at 11:09
  • Is there a node that takes a dataset, and gives you the maximum value of a given column? : Type Node is playing its role. just set the count of "Unit n" columns to 300: It is impossible for me right now as the maximum value of Unit Value read by Type Node is always changing. I'd like to insert dummy records under the group of each name group, but don't know how. – Toshi Oct 10 '18 at 12:54
  • Ok, I went and looked at the docs: you want to have another Aggregate node, which has one group, and outputs the Max of the Unit column. You then want to take the result of that, and feed it into the Restructure node's parameter such that it makes that many columns – Caleth Oct 10 '18 at 13:01
  • It also looks like you can replace this whole mess with a Matrix node, see example at https://www.ibm.com/developerworks/community/forums/html/topic?id=8bd95cbf-8409-4579-ba28-1a1c37c7912a – Caleth Oct 10 '18 at 13:04
  • The result of Type node is more than one value.: Sure. After Type Node read the data of each field(column) and appear the range of data in Unit field. Restructuring node use this range of data to set the new columns; Unit 1, Unit 2・・・. So if the original data set was changed, then the range of data also changed, following that the number of columns generated by Restructuring node is also differ. What I want to do is to fix the number of columns generated by Restructuring Node automatically. – Toshi Oct 10 '18 at 13:04
  • The operation you are trying to reproduce is called a Pivot. There is an out-of-the-box implementation already. You don't have to reproduce that. – Caleth Oct 10 '18 at 13:07
  • Thank you so much, Caleth. URL below is so helpful. Will try again by this way. http://datamininginsights.co.uk/2011/09/14/reverse-restructure-flagtoset/ – Toshi Oct 10 '18 at 13:39

2 Answers2

1

In order to automatize, I suggest to try to use global nodes combined with clem scripts inside the execution (default script). I have a stream that calculates the first date and the last date and those variables are used to rename files at the end of execution. I think you could use something similar as explained here:

1) Create derive nodes to bring the unit values used in the weekly stream

2) Save this information in a table named 'count_variable'

3) Use a Global node named Global with a query similar to this: @GLOBAL_MAX(variable created in (2)) (only to record the number of variables. The step 2 created a table with only 1 values, so the GLOBAL_MAX will only bring the number of variables).

4) The query inside the execution tab will be similar to this:


execute count_variable

var tabledata

var fn

set tabledata = count_variable.output

set count_variable = value tabledata at 1 1

execute Global


5) You now can use the information of variables just using the already creatde "count_variable"

It's not easy to explain just by typing, but I hope to have been helpful. Please mark as +1 in this answer if it was relevant one.

Érica Wong
  • 119
  • 6
  • That is the one I am looking for! And sorry to reply you late. As for the variable "count_variable" with the unit data in it by the global node, how I can replace the max value of unit column read by Type Node with this variable? – Toshi Nov 04 '18 at 07:51
  • No problem. If the tip helped you, plrase mark as plus one.You can use a global node that is the max value of that variable and refer to that in the new table. – Érica Wong Nov 05 '18 at 09:33
0

I think there is a better, simpler and more effective (yet risky, due to node's requirements to input data) solution to your problem. It is called Transpose node and does exactly that - pivot your table. But just from version 18.1 on. Here's an example: https://developer.ibm.com/answers/questions/389161/how-does-new-feature-partial-transpose-work-in-sps/

Julian
  • 154
  • 1
  • 11