0

Apologies if this is elsewhere (and if my question is done poorly - this is my first post). I have searched for days and solved all my other errors, but I keep getting this one: "Error in 1:knots.vec[num.ctr] : NA/NaN argument". I am trying to predict a 4-group categorical class (Q72to73_OpportunitySegments) from a possible 13 variables of which 11 are factors and 2 are numeric. I read my data as.data.frame to R (I removed all NA rows beforehand). My code works on example Carseats data and also works when I do NOT standardize my two numeric variables (fldAge and fldSrvcYrs).

Here's the code that works on Carseats data:

library(dplyr)
library(ISLR)
library(knncat)
fix(Carseats) ## 11 vars: 8 continuous, 3 categorical

## move ShelveLoc factor to front of data
Carseats <- Carseats[,c(7,1:6,8:ncol(Carseats))]

## standardize qual vars and drop original qual vars
Carseats_quantvars <- as.data.frame(scale(Carseats[,2:9]))
Carseats_stdzd <- cbind(Carseats[,-(2:9)], Carseats_quantvars); rm(Carseats_quantvars)

set.seed(1)

train = sample(c(TRUE,FALSE), nrow(Carseats_stdzd), rep=TRUE)

knn.pred <- knncat(Carseats_stdzd[train,], Carseats_stdzd[!train,])
knn.pred  ## gives "Test set misclass rate: 48.09%"
knn.pred$vars  ## gives 2 vars used in knncat: Sales, Price

I ran the exact above on my data and get this:

library(readr)
library(dplyr)
library(knncat)

my_data1 <- read_csv("my_data1.csv", progress=interactive())  ## main datafile

(Does it help to show this?)

Parsed with column specification:
cols(
  Q72to73_OpportunitySegments = col_character(),
  fldSrvcYrs = col_double(),
  ENG_STATE = col_character(),
  fldAge = col_integer(),
  fldGender = col_character(),
  jobclas_13G = col_character(),
  UNIONSTATUS = col_character(),
  APPTSTATUS = col_character(),
  EDUGRP_4G = col_character(),
  DIRECTREPORTS = col_character(),
  JOBSHELD_4G = col_character(),
  JOBSAPPLY_4G = col_character(),
  NEWJOB = col_character(),
  Region_4g = col_character()
)
my_data1 <- my_data1 %>% mutate_if(is.character, factor)
my_data1$fldAge <- as.numeric(my_data1$fldAge)  ## b/c came in as integer

my_data1 <- my_data1[,c(1,2,4,3,5:ncol(my_data1))]
my_data1_quantvars <- as.data.frame(scale(my_data1[,2:3]))
my_data1_quantvars <- rename(my_data1_quantvars, stdzd_SrvcYrs=fldSrvcYrs, stdzd_Age=fldAge)
my_data1_stdzd <- cbind(my_data1[,-(2:3)], my_data1_quantvars); rm(my_data1_quantvars)

set.seed(1)

train = sample(c(TRUE,FALSE), nrow(my_data1), rep=TRUE)

knn.pred <- knncat(my_data1_stdzd[train,], my_data1_stdzd[!train,])

Error in 1:knots.vec[num.ctr] : NA/NaN argument

This error has something to do with one or both of the standardized variables (as when I run the same code on the very same data NOT standardized, the knncat runs). Any ideas how to solve this? (Unfortunately, I cannot share my actual data due to the Statistics Act.)

steveb
  • 5,382
  • 2
  • 27
  • 36
JHawkins
  • 243
  • 1
  • 2
  • 10
  • Without being able to debug the actual code it is not really possible to help you. – m-dz May 18 '17 at 17:12
  • 1
    @JHawkins You are welcome. To get your question answered more quickly, you may want to check out [How to make a great R reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Welcome to Stackoverflow. – steveb May 18 '17 at 17:48

0 Answers0