3

I'm getting the following error when I try to run createDataPartition in caret.

Error in createDataPartition(data1, p = 0.8, list = FALSE) : 
  y must have at least 2 data points

I ran the same exact same code last night with no errors. Any thoughts?

predictors<- with(df, data.frame(xvar, xvar, xvar, xvar))
data1<-with(dfu2, data.frame(data1))
library(caret)
set.seed(1)
trainingRows<- createDataPartition(data1,
                                   p=.80,
                                   list=FALSE)
> dput(head(data1, 15)) structure(list(data1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L)), .Names = "data1", row.names = c(NA, 15L), class = "data.frame")

The data frame data1 is clearly visible in my environment and has the expected observations. Any thought?

hmgogo
  • 31
  • 1
  • 3
  • Please add a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – phiver Oct 22 '15 at 13:30

2 Answers2

4

This does not work because data1 is a data.frame in your case whereas it should be a vector as it is mentioned the documentation of ?createDataPartition. See this example:

#using your data
data1 <- structure(list(data1 = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L)), .Names = "data1", row.names = c(NA, 15L), class = "data.frame")

Now if I do:

> createDataPartition(data1)
Error in createDataPartition(data1) : y must have at least 2 data points

I get the same error as you. Whereas, if it is a vector:

> createDataPartition(data1[[1]] )
$Resample1
[1]  1  2  3  4  8  9 12 15

It works great.

So just use data1[[1]] in your code in the CreateDataPartition call and it will work.

LyzandeR
  • 37,047
  • 12
  • 77
  • 87
0

I solved the same problem by changing the target data type from character to factor because the downsample function require the response should be factor type. Hope this will be helpful

Qing Yuan
  • 67
  • 2