I have to deal with Class Imbalance Problem
and do a binary-classification
of the input test data-set where majority of the class-label is 1 (the other class-label is 0) in the training data-set.
For example, following is some part of the training data :
93.65034,94.50283,94.6677,94.20174,94.93986,95.21071,1
94.13783,94.61797,94.50526,95.66091,95.99478,95.12608,1
94.0238,93.95445,94.77115,94.65469,95.08566,94.97906,1
94.36343,94.32839,95.33167,95.24738,94.57213,95.05634,1
94.5774,93.92291,94.96261,95.40926,95.97659,95.17691,0
93.76617,94.27253,94.38002,94.28448,94.19957,94.98924,0
where the last column is the class-label
- 0
or 1
. The actual data-set is very skewed with a 10:1
ratio of classes, that is around 700 samples have 0 as their class label
, while the rest 6800 have 1 as their class label
.
The above mentioned are only a few of the all the samples in the given data-set, but the actual data-set contains about 90%
of samples with class-label
as 1
, and the rest with class-label
being 0
, despite the fact that more or less all the samples are very much similar.
Which classifier should be best for handling this kind of data-set ?
I have already tried logistic-regression
as well as svm
with class-weight
parameter set as "balanced"
, but got no significant improvement in accuracy.