I'm looking for a way to apply some aggregation methods within a PMML model (with no specific example, just to see if it is possible)
In the documentation of PMML Transformations page there is a passage on Aggregate, defined as a way to apply six functions: Count, Sum, Avg, Min, Max and Multiset.
Does this mean that there is a way to generate a transformation inside a PMML model that will be able to collapse multiple rows of input data into a singular row of prediction? I was unable to find such an example (or any example at all), while this post states that only singular row operations are supported within PMML.
Searching further, the Sklearn2PMML Library has an "Aggregator" method, but this only generates a transformation within a single row, like getting an average out of two columns.
This code:
iris_pipeline = PMMLPipeline([
("mapper", DataFrameMapper([
(["Sepal.Length", "Petal.Length"], [ContinuousDomain(), Aggregator(function = "mean")]),
])),
])
Is able to generate a simple transformation, instead of Aggregate function.
<TransformationDictionary>
<DerivedField name="avg(Sepal.Length, Petal.Length)" optype="continuous" dataType="double">
<Apply function="avg">
<FieldRef field="Sepal.Length"/>
<FieldRef field="Petal.Length"/>
</Apply>
</DerivedField>
</TransformationDictionary>
TL;DR:
Example of what I would like to achieve:
Is there any way to do it inside the PMML model, or should I try to generate such preprocessing actions before applying the data for prediction?