1

I'm new to Machine Learning and ML.Net (both from a coding and model builder perspective). I've written code to train and predict (relatively simple examples against our data) but but thought it would be best to use the Model Builder since it picks the appropriate models to train.

I'm using the Data Classification scenario in the model builder. I have a dataset (from SQL Server) that successfully trains but I wanted to use a different version of the dataset (same schema, different data). When creating this other dataset, I now get the error "Trial 0 encounter error with message: Must be at least 2" and I've not been able to find any information about the error. I've compared the two datasets (column types, null values, checked the Advanced data options to make sure they are the same) - original one that trains and the new one that throws this exception and they appear to be identical other than the data itself.

I went as far as using Telerik JustDecompile to see where in the ML code (Microsoft.ML.Trainers - LinearMulticlassModelParametersBase) this error was being thrown from. I understand there are 2 different types of data classification scenarios - Binary and Multi class. I have a column defined as the label that should be either 1 or 0.

I appreciate any help. Hopefully someone can point me in the right direction. I've been analyzing the dataset that works and the one that doesn't for a # of days and cannot find the difference. Does the model use different algorithms based on the actual data being trained even when the schema is the same?

I'm going to try using these same 2 datasets through code (not using the model builder).

Thanks. Tom

T. DeVoe
  • 21
  • 4

2 Answers2

0

Did your label column have more than two categories in the original dataset? It's possible your multiclass trainer requires at least 3 categories.

As for the selection of algorithms, the model builder picks one based on accuracy metrics by using the AutoML class. But you can just try out different ones in code. Once you have selected one in code it will use that specific algorithm. If you use the model builder you will get different algorithms depending on the dataset you give it.

For example you can just change your pipeline from this:

var pipeline = ctx.Transforms.Text
.FeaturizeText("Features", nameof(SentimentIssue.Text))
.Append(ctx.BinaryClassification.Trainers
    .LbfgsLogisticRegression("Label", "Features"));    

To this:

var pipeline = ctx.Transforms.Text
.FeaturizeText("Features", nameof(SentimentIssue.Text))
.Append (ctx.BinaryClassification.Trainers
            .SdcaLogisticRegression();    

Or even just run the new data through the model builder again and see which trainer it picks.

Nooby-Noob
  • 69
  • 5
0

I got the exact same error message.

I fixed it by doing these things:

  1. In the Model Builder > Data > Advanced Data Options. Make sure to set the Label as Binary as shown in the screenshot.
  2. Restart Visual Studio a lot.
  3. In the SQL to pull the CSV from SQL Server, I did an ORDER BY NEWID() to provide a random distribution of the data set. I don't know if that matters.

enter image description here

Jess
  • 23,901
  • 21
  • 124
  • 145