I am trying to make a document classification software that can classify a document into categories like Financial, Political, Entertainment, etc.
I am using BBC data set and made a TFIDF vector and used RandomForest Classifier to build a machine learning model. I also saved it into a pickel file
Now I can't figure out how to use the saved pickel file and predict the category of a new document. I have wrote the code to open a new document and do all the pre processing and get the pre processed text. How to use this text to classify it using the saved model ? I can't figure out how to add this document to my existing TFIDF vector.
I have this documents array with text files and here is how i used to train the model.
vectorizer = CountVectorizer(max_features=1000 , min_df=5, max_df=0.8)
X = vectorizer.fit_transform(documents).toarray()
tfidfConverter = TfidfTransformer()
X = tfidfConverter.fit_transform(X).toarray()
X_Train , X_Test , Y_Train , Y_Test = train_test_split(X,Y,test_size=0.3 , random_state=0)
classifier = RandomForestClassifier(n_estimators=1000 , random_state=0)
classifier.fit(X_Train,Y_Train)
Y_Predict = classifier.predict(X_Test)
with open('text_classifier','wb') as pickleFile:
pickle.dump(classifier,pickleFile)