I desperately need help!
I am trying to predict drug use based on 5 characteristics: Age, Gender, Education, Ethnicity, Country. I already build a tree model in R with rpart
DrugTree3 <- rpart(formula = DrugUser ~ Age+Gender+Education+Ethnicity+Country, data = traindata)
, a logistic regression model
DrugLog <- glm(formula = DrugUser ~ Age+Gender+Ethnicity+Education+Country,data = traindata, family = binomial)
, and a knn model
KnnModel <- train(form = DrugUser~., data = ModelData,method ='knn',tuneGrid=expand.grid(.k=1:100),metric='Accuracy',trControl=trainControl(method='repeatedcv',number=10,repeats=10))
.
I saved those as RDS files and uploaded them successfully in Power BI.
I then created tables for each characterization and created okviz filters for them.
Then I tried to predict whether a customer gets predicted as a drug user or a non-drug user based on the selections in the okviz filters. This is when everything went horribly wrong:
I created a custom R visual vor each model prediction and inserted the following code in each visual:
# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:
# dataset <- data.frame(chunk_id, model_id, model_str, AgeLabel, GenderLabel, CountryLabel, EducationLabel, EthnicityLabel)
# dataset <- unique(dataset)
# Paste or type your script code here:
library(dplyr)
from_byte_string = function(x) {
xcharvec = strsplit(x, " ")[[1]]
xhex = as.hexmode(xcharvec)
xraw = as.raw(xhex)
unserialize(xraw)
}
# R Visual imports tables with read.csv but no argument for strings_as_factors = F.
# This means some of the chunks are truncated (ie if they had a " " at the end).
# If you convert to a character and add a space if nchar == 9999 the deserialization works.
# (Thanks to Danny Shah)
dataset <- dataset %>%
mutate( model_str = as.character(model_str) ) %>%
mutate( model_str = ifelse(nchar(model_str) == 9999, paste0(model_str, " "), model_str) )
model_vct <- dataset %>%
filter(model_id == 1) %>%
distinct(model_id, chunk_id, model_str) %>%
arrange(model_id, chunk_id) %>%
pull(model_str)
finalfit.str <- paste( model_vct, collapse = "" )
finalfit <- from_byte_string(finalfit.str)
# get the user parameters
userdata <- dataset %>% select(AgeLabel,GenderLabel,CountryLabel,EducationLabel,EthnicityLabel) %>% unique()
# and then using them to make a prediction
myprediction <- predict(finalfit,newdata=data.frame(Age=userdata$AgeLabel,Gender=userdata$GenderLabel,Country=userdata$CountryLabel, Education=userdata$EducationLabel,Ethnicity=userdata$EthnicityLabel))
maxpred <- which(myprediction==max(myprediction))
myclass <- maxpred - 1
myprob <- myprediction[[maxpred]]
plot.new()
text(0.5,0.5,labels=sprintf("P(class = %s) = %s",myclass,as.character(round(myprob,2))),cex=3.5)
Error: Can't determine relationship between fields.
What has gone wrong here?
When I then clicked on the diagonal arrow to get to R Studio, this happens: Unable to construct R script data for use in external R IDE.
I need help as I am literally going crazy over this and I don't know how to resolve the issue! I would be really happy if you can help me