1

I'm in a data science course; we are trying to create a simple decision tree using rpart() for an assignment. I'm by no means an advanced developer, so bear that in mind.

My code runs fine until we get to executing rpart(), where it hangs and crashes RStudio. Every time.

There are only about 120 lines of code so far. My data is imported from a .csv, there are only 102 variables and 56 observations. So not large files.

I had to rename most of the columns using R (post import) to tidy them (removed spaces, shortened, etc.)

Environment: MacOS Mojave, Macbook Pro, Desktop version of RStudio.

  1. Uninstalled/reinstalled RStudio & all packages
  2. Ran MacOS updates
  3. Imported clean .csv
  4. installed library(data.table), tried to import VS10 as.data.table instead of as.data.frame.
  5. tried running code in R Console
library(rpart)
library(skimr)
library(rpart.plot)
library(tidyverse)
library(data.table)

VS10 <- read_csv("VS10.csv")

#convert VS10 to dataframe
VS10 <- as.data.frame(VS10)

#rename features for model
names(VS10)[41] <- c("Violent_Crime")
names(VS10)[49:52] <-c("Absent_1_5","Absent_6_8","Absent_9_12","SusorExpelled")
names(VS10)[65] <- c("HS_Dropout")
student_risk <- c(VS10$Absent_1_5,VS10$Absent_6_8,VS10$Absent_9_12,VS10$SusorExpelled,VS10$HS_Dropout)
VS10["Violent_Crime"]


#merge absentee & dropout, suspended/expelled separate variables into one feature
mean_student_risk <- mean(student_risk)

VS10_feature <- transform(VS10,mean_student_risk)

skim(VS10_feature$Violent_Crime)
summary(VS10_feature$Violent_Crime)

#recode into quartiles 
VS10_feature$dcat[VS10_feature$Violent_Crime > 22.16] <- "4th"
VS10_feature$dcat[VS10_feature$Violent_Crime >= 15.31 & VS10$Violent_Crime <= 22.16] <- "3rd"
VS10_feature$dcat[VS10_feature$Violent_Crime >= 9.53 & VS10$Violent_Crime <= 15.31] <- "2nd"
VS10_feature$dcat[VS10_feature$Violent_Crime < 9.53] <- "1st"


#subset the data with the variables you want to use
VS10_feature2 <- VS10_feature[c(1:39,42:102)]


VS10_feature2$dcat


fitch <- rpart(VS10_feature2$dcat ~ .,
               data=VS10_feature2,
               method="class")

No error messages, but the console just hangs and eventually I have to terminate the R session. Code is error free up to the point of executing RPart()

Phil
  • 7,287
  • 3
  • 36
  • 66
VRCoder13
  • 11
  • 2
  • I'm not sure if it makes a difference, but your `rpart` call has a possible mistake. Since you're using the `data=` argument, you don't need to reference the dataset again in your formula - just `rpart(dcat ~ ., data=VS10_feature2, method="class")` should do it. – thelatemail Jul 26 '19 at 00:01
  • Thanks @thelatemail - I tried your suggestion but I'm afraid it still hangs. – VRCoder13 Jul 26 '19 at 00:27
  • If you would like to fully uninstall then clean reinstall R and RStudio, follow the steps here: https://stackoverflow.com/a/61187094/1953250 – ozturkib Jun 09 '20 at 09:07

2 Answers2

0

I'm having the same issue! I have to force quit RStudio as it just hangs indefinitely. If I only run rpart() using only two or three features/variables then it runs fine.

0

As it turns out, one of the 100+ columns was importing as a character vector. I omitted this column, and rpart() worked just fine.

VRCoder13
  • 11
  • 2