3

i need help with createDataPartition I am getting this error

Error in createDataPartition(TBdta$medv, p = 0.8, list = FALSE) : y must have at least 2 data points

my code

library(tibble)
dta <- url("http://course1.winona.edu/bdeppa/Stat%20425/Data/Boston_Housing.csv")
TBdta <- as_tibble(read.csv(dta, check.names = FALSE)) 
TBdta

My error comes when i run the below chunch

# Split out validation dataset
# create a list of 80% of the rows in the original dataset we can use for training
set.seed(7)
validationIndex <- createDataPartition(TBdta$medv, p=0.80, list=FALSE)
# select 20% of the data for validation
validation <- TBdta$medv[-validationIndex,]
# use the remaining 80% of data to training and testing the models
dataset <- TBdta$medv[validationIndex,]

Error in createDataPartition(TBdta$medv, p = 0.8, list = FALSE) : y must have at least 2 data points

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Matrix007
  • 51
  • 1
  • 4
  • 7
  • 1
    I'm afraid that is just a typo. The column name is `MEDV` and you are using `medv` in lower case. – Ronak Shah Dec 07 '19 at 01:07
  • thank you but that doesn't work either, still get Warning messages: 1: Unknown or uninitialised column: 'medv'. 2: Unknown or uninitialised column: 'medv'. Unknown or uninitialised column: 'medv'.Unknown or uninitialised column: 'medv'. – Matrix007 Dec 07 '19 at 01:16
  • are you sure you are using the correct column name? It works for me though. `validationIndex <- caret::createDataPartition(TBdta$MEDV, p=0.80, list=FALSE)` – Ronak Shah Dec 07 '19 at 01:18
  • this is my code set.seed(7) validationIndex <- caret::createDataPartition(TBdta$MEDV, p=0.80, list=FALSE) # select 20% of the data for validation validation <- TBdta$MEDV[-validationIndex,] # use the remaining 80% of data to training and testing the models dataset <- TBdta$MEDV[validationIndex,] – Matrix007 Dec 07 '19 at 01:25
  • still get error, Show in New WindowClear OutputExpand/Collapse Output Warning messages: 1: Unknown or uninitialised column: 'medv'. 2: Unknown or uninitialised column: 'medv'. 3: Unknown or uninitialised column: 'medv'. Error in TBdta$MEDV[-validationIndex, ] : incorrect number of dimensions – Matrix007 Dec 07 '19 at 01:25
  • code error at validation <- TBdta$MEDV[-validationIndex,] – Matrix007 Dec 07 '19 at 01:28

1 Answers1

1

I guess what you need is

set.seed(7)
validationIndex <- caret::createDataPartition(TBdta$MEDV, p=0.80, list=FALSE)
validation <- TBdta[-validationIndex,]
dataset <- TBdta[validationIndex,]

So that you have

dim(validation)
#[1] 99 14
dim(dataset)
#[1] 407  14
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213