How to create,train, and print out the result from a RandomForestClassifer on a dataset

Question

I have a csv file below called train.csv:

   25.3, 12.4, 2.35, 4.89, 1, 2.35, 5.65, 7, 6.24, 5.52, M
   20, 15.34, 8.55, 12.43, 23.5, 3, 7.6, 8.11, 4.23, 9.56, B
   4.5, 2.5, 2, 5, 10, 15, 20.25, 43, 9.55, 10.34, B
   1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, M

Thanks to the help from other users on Stackoverflow I was able to load the dataset and use other types of classifications. I am having trouble understanding how to use RandomForestClassifications. I need to be able to use RandomForestClassification to create and train it using the dataset from before as well as print out the result.

  data_train = pd.read_csv("train.csv", header= None, usecols=[i for i in range(11)])
  l = [i for i in range(10)]
  X_train = data_train[l]
  y_train = data_train[10]
  clf = RandomForestClassifier(n_estimators=100, max_depth= 2, random_state=0)
  clf.fit(X_train,y_train)
  RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=2, max_features='auto', max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=2,
            min_weight_fraction_leaf=0.0, n_estimators=100, n_jobs=None,
            oob_score=False, random_state=0, verbose=0, warm_start=False)

I don't understand how can you print RandomForestClassifier to see the results of the classification. I also am not sure what the output is even supposed to be based on this classification. If you can please explain how RandomForestClassifier works, how it can be created and trained, anything that I missed, and how to print out the RandomForestClassifier to see the result.

Note related to this stackoverflow question: Loading a Dataset for Linear SVM Classification from a CSV file

score 1 · Accepted Answer · answered Nov 21 '19 at 16:31

You have successfully trained your classifier, which means that it is fitted.
Now, you need to have some sort of validation or test data to test it on. Once you do, you can evaluate the result yourself or use a function from scikit-learn:

from sklearn.metrics import accuracy_score, classification_report

y_pred = clf.predict(X_test, y_test)
accuracy = accuracy_score(y_test, y_pred)
print(classification_report(y_test, y_pred))

Here is the output on your train set. Here the data is very small so the score is perfect on every level, something you never see usually.

              precision    recall  f1-score   support

           B       1.00      1.00      1.00         2
           M       1.00      1.00      1.00         2

   micro avg       1.00      1.00      1.00         4
   macro avg       1.00      1.00      1.00         4
weighted avg       1.00      1.00      1.00         4

When you mean test data what I provided was the train.csv. But I also have test.csv. So I used the information from train.csv and test it on test.csv? — user20304030, Nov 21 '19 at 16:48
Yes, you fit it on train.csv and predict on test.csv. Otherwise you will [overfit](https://en.wikipedia.org/wiki/Overfitting) — Horace, Nov 21 '19 at 16:53

How to create,train, and print out the result from a RandomForestClassifer on a dataset

1 Answers1