I have a very large dataset that needs to be used for classification, I sampled the data, but that does not guarantee that I will have the whole labels in my output. How can I sample my data to cover all labels?
Also, I wanted to save the label encoder
and the RandomForestClassifier
that I used in this process to use them for incremental learning. I searched about using the RandomForestClassifier
I found out that set_warm
feature is only used for adding extra estimators not updating the weights. and partial_fit
does not support using random forest. So my second question is how to updated the label encoder
and the RandomForestClassifier
for training another datasets that might have other labels and more datapoints?
Asked
Active
Viewed 215 times
0

Mee
- 1,413
- 5
- 24
- 40
-
Why not subset the dataset first according to labels and then use sampling on it as it will also make sure your dataset is balanced – Sociopath Jan 20 '21 at 08:57
-
Thank you so much. that's a good point regarding my first question, but what if in the future I had a new data that has more labels? I would have to use incremental learning to combine between my current data and the other data labels – Mee Jan 20 '21 at 09:01