1

Is there any alternative for rtexttools or another package for this kind of classification methodology, because these package were erased, also maxent and glmnet and they depended on rtexttools and vice verse; here is the script that im trying to apply and classify

library(maxent)
library(openxlsx)
library(RTextTools)
library(readxl)
library("tm")
library("SnowballC")
library("wordcloud")
library("RColorBrewer")
library("tidyverse")
library(purrrlyr)
library(text2vec)
library(caret)
library(glmnet)
library(ggrepel)
library(arm)

cas = read_excel("C:/Users/Desktop/Modelo JUN19/Copia de Data Final Entrenamiento - ML DS CAS - 21-06-2019 - REV01.xlsx")

datos=data.frame(clase=cas$servicio,text=cas$a_subject)




trainIndex <- createDataPartition(datos$clase, p = 0.8, 
                                  list = FALSE,times = 1)

data_train=datos[trainIndex,]
data_test=datos[-trainIndex,]

matrix_train <- create_matrix(data_train$text,language="spanish",stemWords=FALSE)

matrix_test <- create_matrix_test(data_test$text,language="spanish",stemWords=FALSE,
                                  originalMatrix=matrix_train )


container_train <- create_container(matrix_train,data_train$clase,
                                    trainSize=1:length(data_train$clase),virgin=FALSE)



container_test <- create_container(matrix_test,labels=rep(0,length(data_test$clase)),
                                   trainSize=1:length(data_test$clase),
                                   virgin=FALSE)


##################### SVM
#lineal
t_svm_lineal <- Sys.time()  
model_SVM_lineal <- train_model(container_train, "SVM", kernel="linear")
print(difftime(Sys.time(), t_svm_lineal, units = 'mins'))
#Clasificación según el modelo
clas_svm_lineal_train <- classify_model(container_train, model_SVM_lineal )
clas_svm_lineal_test <- classify_model(container_test, model_SVM_lineal )
# tasa acertado
aceptacion_svm_lineal_train=summary(as.character(data_train$clase)==as.character(clas_svm_lineal_train$SVM_LABEL))
aceptacion_svm_lineal_test=summary(as.character(data_test$clase)==as.character(clas_svm_lineal_test$SVM_LABEL))
#Resumen de las categorias
summary(clas_svm_lineal_train$SVM_LABEL)
table(clas_svm_lineal_train$SVM_LABEL,data_train$clase)
table(clas_svm_lineal_test$SVM_LABEL,data_test$clase)
Sammy
  • 25
  • 3

1 Answers1

3

First, the package(s) are not on CRAN anymore but you can still use them if you want. The easiest way is to install them from the archive:

install.packages("https://cran.r-project.org/src/contrib/Archive/maxent/maxent_1.3.3.1.tar.gz", type = "source", repos = NULL)
install.packages("https://cran.r-project.org/src/contrib/Archive/RTextTools/RTextTools_1.4.2.tar.gz", type = "source", repos = NULL)

I tested it recently against some more modern implementations and especially maxent still holds up pretty well and will maybe find a new home at some point.

Second, there are a number of alternatives for text classification and machine learning. For machine learning itself, the caret package (manual) is not bad and can handle some text classification. However, keep in mind that it is not optimized for text. A really cool new package which will hopefully make it to CRAN soon is quanteda.classifiers while quanteda itself already has Naive Bayes implemented (Tutorial).

Third, there are a lot of other packages out there that I don't know about and I do not dare suggest any one is better suited to whatever you want to do than anything else out there. I found this thread a while ago that discusses some options: https://github.com/bnosac/ruimtehol/issues/11.

JBGruber
  • 11,727
  • 1
  • 23
  • 45
  • I just substituted Rtextools for caret and also quanteda, but now im having an issue with the function create_matrix, it launches me the following error: > matrix_train <- create_matrix(data_train$text,language="spanish",stemWords=FALSE) Error in create_matrix(data_train$text, language = "spanish", stemWords = FALSE) : could not find function "create_matrix" – Sammy Dec 06 '19 at 19:31