I'm in a data science course; we are trying to create a simple decision tree using rpart() for an assignment. I'm by no means an advanced developer, so bear that in mind.
My code runs fine until we get to executing rpart(), where it hangs and crashes RStudio. Every time.
There are only about 120 lines of code so far. My data is imported from a .csv, there are only 102 variables and 56 observations. So not large files.
I had to rename most of the columns using R (post import) to tidy them (removed spaces, shortened, etc.)
Environment: MacOS Mojave, Macbook Pro, Desktop version of RStudio.
- Uninstalled/reinstalled RStudio & all packages
- Ran MacOS updates
- Imported clean .csv
- installed library(data.table), tried to import VS10 as.data.table instead of as.data.frame.
- tried running code in R Console
library(rpart)
library(skimr)
library(rpart.plot)
library(tidyverse)
library(data.table)
VS10 <- read_csv("VS10.csv")
#convert VS10 to dataframe
VS10 <- as.data.frame(VS10)
#rename features for model
names(VS10)[41] <- c("Violent_Crime")
names(VS10)[49:52] <-c("Absent_1_5","Absent_6_8","Absent_9_12","SusorExpelled")
names(VS10)[65] <- c("HS_Dropout")
student_risk <- c(VS10$Absent_1_5,VS10$Absent_6_8,VS10$Absent_9_12,VS10$SusorExpelled,VS10$HS_Dropout)
VS10["Violent_Crime"]
#merge absentee & dropout, suspended/expelled separate variables into one feature
mean_student_risk <- mean(student_risk)
VS10_feature <- transform(VS10,mean_student_risk)
skim(VS10_feature$Violent_Crime)
summary(VS10_feature$Violent_Crime)
#recode into quartiles
VS10_feature$dcat[VS10_feature$Violent_Crime > 22.16] <- "4th"
VS10_feature$dcat[VS10_feature$Violent_Crime >= 15.31 & VS10$Violent_Crime <= 22.16] <- "3rd"
VS10_feature$dcat[VS10_feature$Violent_Crime >= 9.53 & VS10$Violent_Crime <= 15.31] <- "2nd"
VS10_feature$dcat[VS10_feature$Violent_Crime < 9.53] <- "1st"
#subset the data with the variables you want to use
VS10_feature2 <- VS10_feature[c(1:39,42:102)]
VS10_feature2$dcat
fitch <- rpart(VS10_feature2$dcat ~ .,
data=VS10_feature2,
method="class")
No error messages, but the console just hangs and eventually I have to terminate the R session. Code is error free up to the point of executing RPart()