2

I am trying to build a predictive classification model using the Binary Logistic Regression and Penalized LASSO which eventually I will compare both models. The thing is that I am trying to understand more the data and run some tests before applying the models such as multicollinearity test but the data types are being converted incorrectly.

The data set consists of both of numeric and factor variables. I have imported the data in r from a csv file and before importing the data I have changed all the variables which were "factors" to "numeric" manually. I have selected specifically which columns I want from the whole data set, but when this is done the matrix should be numeric using as.matrix but this is not the case.

Data<- read.csv("Test.csv")
names(Data)
attach(Data)
dim(Data)
sapply(Data,class)
ChurnFlag <- ifelse(ChurnedFlag=="Y",1,0)

#combinding all the new created variables
DataMat <- as.matrix(cbind(Data,ChurnFlag))

#selecting specifically which variables I want to analyse which are all 
numeric/integer
DataMatRed <- as.matrix((DataMat[,c(4:8,10:73,92)]))

DataMatRedNum <- mapply(DataMatRed,FUN=as.numeric) 
#defining the matrix as numeric
is.numeric(DataMatRedNum) #checking that it is numeric

DataMatDF <- as.data.frame(DataMatRed)
DataMatDF2 <- data.frame(DataMatRed,row.names = NULL,check.rows = FALSE,check.names = TRUE) /*

I expect to have the a numeric matrix not character because when trying to run the colldiag function in R it is not working and the error is as follows:

Error in svd(X) : infinite or missing values in 'x'

and i have checked if I have any missing values and there are no missing values

Cettt
  • 11,460
  • 7
  • 35
  • 58
Lise
  • 51
  • 1
  • 6
  • 2
    it is difficult to help you if you do not provide your data. In general, to convert factors to numeric you should use `as.numeric(as.character(x))`. – Cettt May 23 '19 at 08:23
  • Possibily related: https://stackoverflow.com/questions/3418128/how-to-convert-a-factor-to-integer-numeric-without-loss-of-information – RLave May 23 '19 at 08:26
  • @Cettt - It is impossible to share data due sensitivity of the data..The thing is that I already converted myself in excel the data from factor to numeric but when applying `as.matrix()` R itself is converting the integer/numeric variables into characters – Lise May 23 '19 at 08:31
  • you dont have to post all of the data, only the first five rows and the relevant columns. When working with data in R you don't convert it to matrix format but to data.frame format. The function `read.csv` has several input arguments like `na.strings` and `stringsAsFactors` which can be used to avoid factors in the first place. – Cettt May 23 '19 at 08:34
  • @Cettt - But if I want to run then a `rcorr` in R the data type must be a matrix no? – Lise May 23 '19 at 10:44
  • yes but you can convert your data to matrix inside the `rcorr` function – Cettt May 23 '19 at 11:12

0 Answers0