0

I want to perform logistic regression with genetic algorithm for variable selection with stratified 5 fold cross validation. I run my r script, but i got this error: Error in x[which(data$Status == 1), ] : incorrect number of dimensions. How can I fix this error?

My data consist of dependent variable with 2 classes and 6 independent variable.

#DATA
Status = sample(c(1,0), 50, replace = TRUE)
col1 = sample(c(0,1), 50, replace = TRUE)
col2 = sample(c(0,1), 50, replace = TRUE)
col3 = sample(c(0,1), 50, replace = TRUE)
col4 = sample(c(0,1), 50, replace = TRUE)
col5 = sample(c(0,1), 50, replace = TRUE)
col6 = sample(31:80)
data <- data.frame(Status, col1, col2, col3, col4, col5, col6)

#CONVERT TO FACTOR
for(k in 1:6) {
  data[, k] <- as.factor(data[, k])
}

I change my data type with as.factor

#Randomly shuffle the data
data<-data[sample(nrow(data)),]
datax=data[,2:7]
datay=data[,1]
datay0=data[which(data$Status==1),]
datay0=datay0[,1]
datay1=data[which(data$Status==0),]
datay1=datay1[,1]
n0=length(datay0)
n1=length(datay1)
#Create 5 equally size folds
folds0 <- cut(seq(1,length(datay0)),breaks=5,labels=FALSE)
folds1 <- cut(seq(1,length(datay1)),breaks=5,labels=FALSE)

library(caret)
#Perform 5 fold cross validation
fitness <- function(string)
  {
  inc <- which(string==1)
  x <- datax[,inc]
  datax0=x[which(data$Status==1),]
  datax1=x[which(data$Status==0),]
  akurasi<- rep(0,5)
  sensi<- rep(0,5)
  speci<-rep(0,5)
  auc<- rep(0,5)
  for(i in 1:5){
    #Segement your data by fold using the which() function 
    testIndexes0 <- which(folds0==i,arr.ind=TRUE)
    testIndexes1 <- which(folds1==i,arr.ind=TRUE)
    trainx0 <- datax0[-testIndexes0, ]
    trainx1 <- datax1[-testIndexes1, ]
    trainy0 <- datay0[-testIndexes0]
    trainy1 <- datay1[-testIndexes1]
    testx0 <- datax0[testIndexes0, ]
    testx1 <- datax1[testIndexes1, ]
    testy0 <- datay0[testIndexes0]
    testy1 <- datay1[testIndexes1]
    if(ncol(x)==1){
      xtrain <- data.frame(c(trainx0,trainx1)-1)
      colnames(xtrain) = "x"
      xtest <- data.frame(c(testx0,testx1)-1)
      colnames(xtest) = "x"
    } else {
      xtrain <- rbind(trainx0,trainx1)
      xtest <- rbind(testx0,testx1)
    }
    ytrain <- c(trainy0, trainy1)-1
    ytest <- c(testy0,testy1)-1
    #Use the test and train data partitions however you desire...
    b <- data.frame(cbind(ytrain,xtrain))
    model <- glm(as.factor(ytrain)~.,data=b, family=binomial(link='logit'))
    predtest=as.factor(ifelse(predict(model,xtest,type='response')>0.5,1,0))
    cm <- confusionMatrix(predtest, as.factor(ytest))
    akurasi[i] <- cm$overall['Accuracy']
    sensi[i] <- cm$byClass['Sensitivity']
    speci[i] <- cm$byClass['Specificity']
    auc[i] <- 0.5*(sensi[i]+speci[i])
  }
  mean.auc=mean(auc)
  return(mean.auc)
  }
library(GA)
gaControl("binary" = list(selection = "ga_rwSelection"))
GA_REGLOG <- ga("binary", fitness = fitness, nBits = ncol(datax),
                  names = colnames(datax), 
                  selection = gaControl("binary")$selection,
                  popSize = 100, pcrossover = 0.8, pmutation = 0.1, maxFitness = 1)

I think, there is problem here

datax0=x[which(data$Status==1),]
datax1=x[which(data$Status==0),]
  • We're looking at a whole lot of code but no data. [See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making a reproducible R example folks can help with. Also see [mcve] with emphasis on *minimal*. I'd start debugging by going through line by line to figure out where exactly the error occurs. – camille May 21 '19 at 15:16
  • Take a look at the 2 links I posted to see how to 1. post a representative sample of data that folks don't need to download from a third-party site, and 2. pare this down to just the code needed to solve the issue – camille May 21 '19 at 15:37
  • I am sory, I have fixed the data. Is it correct? – Dinda Galuh May 21 '19 at 16:06
  • The code is too long because I don't know how to do genetic algorithm with cross validation in logistic regression. So i use looping which makes the code longer – Dinda Galuh May 21 '19 at 16:10
  • But if you're trying to pinpoint where the error is, you may not need *all* your code, especially if, as you say, the error comes from the first few lines. I'm not getting an error, until I get to the part calling `confusionMatrix`, because you haven't said what package that's from – camille May 21 '19 at 16:30
  • Oh sorry for ```confusionMatrix```, i use caret library. I'm getting an error, after i run ```GA_REGLOG```. That problem says that "Error in `[.default`(x, which(data$Status == 1), ) : incorrect number of dimensions", so i think there is problem with fitness code – Dinda Galuh May 21 '19 at 16:44

0 Answers0