In my project, I am forced to do some Machine Learning with C#. Unfortunately, ml.net
is much less intuitive than in all other languages, and I fail to execute a RegressionExperiment.
First, here are my data classes:
public class DataPoint
{
[ColumnName("Label")]
public float y { get; set; }
[ColumnName("catFeature")]
public string str { get; set; }
[ColumnName("smth")]
public float smth { get; set; }
}
public class MLOutput
{
[ColumnName("Score")]
public float score { get; set; }
}
I think my problem lies in the encoding of a category variable. For a single model, the code below works fine.
//Create an ML Context
var ctx = new MLContext();
IDataView trainingData = ctx.Data.LoadFromEnumerable(data: data as IEnumerable<DataPoint>);
// Build your data processing and training pipeline
var pipeline = ctx.Transforms.Categorical.OneHotEncoding(outputColumnName: "catFeatureEnc", inputColumnName: "catFeature")
.Append(ctx.Transforms.Concatenate("Features", new[] {"catFeatureEnc","smth"}))
.Append(ctx.Regression.Trainers.FastForest());
// Train your model ????
var trainedModel = pipeline.Fit(trainingData); // shouldn't we transform before fit?????
IDataView transformedData = trainedModel.Transform(trainingData);
Now, removing the FastForest model from the pipeline and adding the AutoML code, Microsoft.ML cannot handle the encoding:
// Build your data processing and training pipeline
var pipeline = ctx.Transforms.Categorical.OneHotEncoding(outputColumnName: "catFeatureEnc", inputColumnName: "catFeature")
.Append(ctx.Transforms.Concatenate("Features", new[] {"catFeatureEnc","smth"}))
.Append(ctx.Transforms.Conversion.ConvertType("Features", "Features", DataKind.Single));
// do smth ???
var trainedModel = pipeline.Fit(trainingData); // nothing there to be fitted???
IDataView transformedData = trainedModel.Transform(trainingData); // shouldn't we transform before fit?????
var experimentSettings = new RegressionExperimentSettings();
experimentSettings.MaxExperimentTimeInSeconds = 60;
// Cancel experiment after the user presses any key
var cts = new CancellationTokenSource();
experimentSettings.CancellationToken = cts.Token;
RegressionExperiment experiment = ctx.Auto().CreateRegressionExperiment(experimentSettings);
ExperimentResult<RegressionMetrics> experimentResult = experiment.Execute(transformedData, "Label");
Now, I get the following exception:
Only supported feature column types are Boolean, Single, and String. Please change the feature column catFeatureEnc of type Key<UInt32, 0-2> to one of the supported types. "
If I remove catFeatureEnc
from the Concatenate call, the code works fine. Alternatively, I tried to create a new pipeline for the training with the transformed data. Unfortunately, this approach doesn't work in the slightest, as the new pipeline expects arbitrary data types for many features.
Another alternative approach:
ExperimentResult<RegressionMetrics> experimentResult = experiment.Execute(trainingData, "Label");
throws the exception:
Training failed with the exception: System.InvalidOperationException: Concatenated columns should have the same type. Column 'smth' has type of Single, but the expected column type is Byte
Idk...Why is a Byte expected?
How can I use the encoded feature with Microsoft Auto.ML?