EDIT:
I am trying to model a dataset via kNN
(caret package) classifier in r, but it runs for a very long time. eventually I am stopping it. Sometimes whwn I stop it, it says "use warnings() to see all warning messages". When I do that, it says "lots of ties" for each column in the data set. I found a few solutions for this problem in here but none of them is worked in my situation. They say "put some pseudo-random noise data into data set, and it will work". I tried it, and didn't work:
https://stats.stackexchange.com/questions/25926/dealing-with-lots-of-ties-in-knn-model
END EDIT.
Thats why I am giving my train dataset's link to you guys, so maybe one of you can understand why kNN
stucks when modeling it:
http://www.htmldersleri.org/train.csv (It is well-know Reuters-21578 dataset)
And here is the kNN r line:
knn<-train(as.factor(class)~.,data=as.matrix(train),method="kNN")
or
knn<-train(as.factor(class)~.,data=train,method="kNN")
none of them is working.
By the way, instead of kNN
, using svmLinear
does not work either.
And an Important note: I applied unique()
function on all the columns and I noticed that there isn't any column that has only one value. They all are varies.
Lastly, here is the dataset information part in my project report, might be useful:
In Reuters-21578 dataset, we used top ten classes; 7269 samples in training set and 2686 samples in test set. The distribution of the classes is unbalanced. The maximum class has 2899 documents, occupying 39,88% of training set. The minimum class has 113 documents, occupying 1,55% of training set. Table I shows the ten most frequent categories along with the number of training and test set examples in each.