WEKA classification results similar but different performances

Question

First I read this: How to interpret weka classification? but it didn't helped me.

Then, to set up the background, I am trying to learn in kaggle competitions and models are evaluated with ROC area.

Actually I built two models and data about them are represented in this way:

Correctly Classified Instances       10309               98.1249 %
Incorrectly Classified Instances       197                1.8751 %
Kappa statistic                          0.7807
K&B Relative Info Score             278520.5065 % 
K&B Information Score                  827.3574 bits      0.0788 bits/instance 
Class complexity | order 0            3117.1189 bits      0.2967 bits/instance 
Class complexity | scheme              948.6802 bits      0.0903 bits/instance  
Complexity improvement     (Sf)       2168.4387 bits      0.2064 bits/instance 
Mean absolute error                      0.0465 
Root mean squared error                  0.1283 
Relative absolute error                 46.7589 % >72<69
Root relative squared error             57.5625 % >72<69
Total Number of Instances            10506     

=== Detailed Accuracy By Class ===

           TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
             0.998     0.327      0.982     0.998     0.99       0.992    0
             0.673     0.002      0.956     0.673     0.79       0.992    1
Weighted Avg.    0.981     0.31       0.981     0.981     0.98       0.992

Apart of K&B Relative Info Score; Relative absolute error and Root relative squared error which are respectively inferior, superior and superior in the best model as assessed by ROC curves, all data are the same. I built a third model with similar behavior (TP rate and so on), but again K&B Relative Info Score; Relative absolute error and Root relative squared error varied. But that did not allowed to predict if this third model was superior to both first (variations where the same compared to the best model, so theorically it should have been superior, but it wasn't).

What should I do to predict if a model will perform well given such details about it?

Thanks by advance.

Does this: http://scikit-learn.org/stable/modules/model_evaluation.html help you? When I was studying machine learning, sometimes different sites offered help, even if they weren't using Weka or Java. If you can read the code there, you can maybe adapt their information to create a solution for your problem. I never predicted the quality of a model before, so I can't give you more detailled information. — KJaeg, May 27 '15 at 12:35
This could be interesting, too: http://www2.geog.ucl.ac.uk/~mdisney/teaching/GEOGG121/bayes/gauch_model_eval.pdf — KJaeg, May 27 '15 at 12:38
Well, thanks, I am not sure I will be able to answer to my question with that, but at least I will learn something. It's is a little counter intuitive that some model with more error can perform better. Also I thought of relative info score as something evaluating the "completeness" of the model compared to data on which model is trained (that is, if the classification rules found allows for classification for all instance), but seems it is not exactly the case. — Ando Jurai, May 27 '15 at 12:48
It was long ago, since I had to work on machine learning topics. I only checked the quality of my models by comparing their real results. But I never came to the situation to predict their behaviour. I know, that Google returns way too less relevant information for machine learning questions. It turned out that "Machine Learning" from Peter Flach was the best "all in one" resource I had, even it is not the best written book. If you can get access to this book, take a look in there. Maybe you'll finde something. — KJaeg, May 27 '15 at 13:01
Back to your question: Naively I would validate the models with a cross-validation on different data sets, and take the results as a prediction. Is that an option? — KJaeg, May 27 '15 at 13:03
It was, relatively. I splitted the provided datasets in 5 folds and did that, but it was not much better actually. I could not obtain something that was meaninful in terms of ROC. But part of it is related to the fact that test sets had different biases than training set (the problem was something like a time series, we had years to train and years to predict, but it was pretty ill modelized as a whole). Anyway, thanks for the pointer to Peter Flach's book. I'll try to read it. Thanks everyone for trying to help. — Ando Jurai, Jun 25 '15 at 11:36

WEKA classification results similar but different performances

0 Answers0