1

I'm using SPSS modeler to calculate the accuracy of various weekly estimates. I have a data table with a couple weeks of estimates and a couple weeks of actual data points. Currently, to calculate the errors per week I have to use a derive node for every week ex: Wk1 error <- (Wk1 estimate - Wk1 actual). This is naturally inefficient when considering many weeks. Is there a way to derive all of these error columns at once?

In the data below, can we get 3 new columns for week1, week2, week3 error?

Sample Data

Som Naik
  • 11
  • 1

1 Answers1

0

I'm not sure if this is possible within the normal Modeler GUI as you would need to parse the respective field names and use them as parameters within a derive (multiple) node.

You could however use the Modeler Scripting Language. See the following script that creates a userinput node and fills it with some sample data and dynamically creates the derive nodes based on the weekX columns:

import modeler.api
import re

s = modeler.script.stream()

# create user input node and fill with sample data
input_node = s.create("userinput", "Userinput Node")
input_node.setPropertyValue("names", ["group", "week1 (estimate)", "week2 (estimate)", "week3 (estimate)", "week1 (actual)", "week2 (actual)", "week3 (actual)"])

input_node.setKeyedPropertyValue("data", "group", '"A" "B" "C" "D"')
input_node.setKeyedPropertyValue("data", "week1 (estimate)", '1 2 5 1')
input_node.setKeyedPropertyValue("data", "week2 (estimate)", '2 3 1 4')
input_node.setKeyedPropertyValue("data", "week3 (estimate)", '1 1 1 1')
input_node.setKeyedPropertyValue("data", "week1 (actual)", '1 3 2 5')
input_node.setKeyedPropertyValue("data", "week2 (actual)", '1 3 6 2')
input_node.setKeyedPropertyValue("data", "week3 (actual)", '1 1 1 1')

input_node.setKeyedPropertyValue("custom_storage", "group", "String")
input_node.setKeyedPropertyValue("custom_storage", "week1 (estimate)", "Integer")
input_node.setKeyedPropertyValue("custom_storage", "week2 (estimate)", "Integer")
input_node.setKeyedPropertyValue("custom_storage", "week3 (estimate)", "Integer")
input_node.setKeyedPropertyValue("custom_storage", "week1 (actual)", "Integer")
input_node.setKeyedPropertyValue("custom_storage", "week2 (actual)", "Integer")
input_node.setKeyedPropertyValue("custom_storage", "week3 (actual)", "Integer")

input_node.setPropertyValue("data_mode", "Ordered")

# alternatively (e.g. data from database, flatfile): find input node by id and read fieldlist
#input_node = s.findByID("id1UTWII7ZNZF")
#fieldlist = []
#for field in input_node.getOutputDataModel().iterator():
#    fieldlist.append(field)

# get fieldlist from input node
fieldlist = input_node.getPropertyValue("names")

print(fieldlist)

# loop through field list and assemble dict
weekdic = {}

p = re.compile('[wW]eek.?[0-9]')

for field in fieldlist:
    print(field)
    m = p.match(field)
    if m:
        try:
            weekdic[m.group()[len(m.group())-1]].append(field)
        except:
            weekdic[m.group()[len(m.group())-1]] = [field]

derive = False

# loop through dic and create derive node and set formula 
    
for i in weekdic:
    derive_temp = s.create("derive", "My node"+i)
    if derive:
        s.link(derive,derive_temp)
        derive = derive_temp
    else:
        derive = derive_temp
        s.link(input_node,derive)

    derive.setPropertyValue("new_name", "week" + i + " (error)")
    derive.setPropertyValue("result_type", "Formula")
    derive.setPropertyValue("formula_expr", "'" + weekdic[i][0] + "'" + ' - ' + "'" + weekdic[i][1] + "'")

# link last derive node to outputtable
tablenode = s.createAt("table", "Results", 288, 64)
s.link(derive, tablenode)
results = []
tablenode.run(results)
NielsFlohr
  • 183
  • 7