1

I've been asked to apply knn to data for "income", based on "age", "gender" and "occupation" from the adults.txt data. This is my code after loading the data into R.

library(class)
set.seed(1234)
ind <- sample(2, nrow(adult), replace=TRUE, prob=c(0.75, 0.25))
adult.training <- adult[ind==1, c(1,7,10)]
adult.test <- adult[ind==2, c(1,7,10)]
adult.trainLabels <- adult[ind==1, c(15)]
adult.testLabels <- adult[ind==2, c(15)]
adult_pred <- knn(train=adult.training, test=adult.test, cl=adult.trainLabels, k=3)

I get the following errors:

Error in knn(train=adult.training, test=adult.test, cl=adult.trainLabels, :
NA/NAN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In knn(train=adult.training, test=adult.test, cl=adult.trainLabels, :
NAs introduced by coercion
2: In knn(train=adult.training, test=adult.test, cl=adult.trainLabels, :
NAs introduced by coercion`

Is it possible to generate knn data for "income" based on the above variables two of which are factors?

Clinton Adams
  • 121
  • 1
  • 1
  • 6
  • 1
    Where is the `adults.txt` data? Please make sure your example is [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Oct 22 '15 at 04:37
  • The data can be downloaded from this link https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data – Clinton Adams Oct 22 '15 at 14:41
  • Your question should be complete without having to download external data. Suggestions for how to do this were provided in the link i gave on how to make your problem reproducible. So you want to run `knn` with categorical data? That doesn't make much sense. `knn` works with "Euclidean distances". What's the distance between a "male" and "female"? You might want to look for a different statistical technique. – MrFlick Oct 22 '15 at 15:30
  • @MrFlick thanks for that. I do know that knn will not run on variables that are factors, maybe the task that I was given was a bit vague. I'll try an alternative approach. – Clinton Adams Oct 22 '15 at 18:01
  • As much as this is years later I would like to shed some light on the topic- KNN can be ran using categorical variable outcomes. If your variable is male & female then separate levels into binary variables but as a target variable it is more then acceptable. Male or Female would be a classification the Euclidean distance is the distance between variables. 1 - 0 from binary would result in an Male or Female outcome. – Michael Cantrall Mar 13 '18 at 23:16

0 Answers0