Error in sample.split in R, 'SplitRatio' parameter has to be i [0, 1]

Question

I need to analyse negative or positive text messages, and find out which words define a positive or negative text. At this point, I need to split the data between a test set and a training set. However, this happens:

library(caTools)
split = sample.split(smsSparse$sentiment, SplitRatio = .7)
# Error in sample.split(smsSparse$sentiment, SplitRatio = 0.7) : 
#   Error in sample.split: 'SplitRatio' parameter has to be i [0, 1] range or [1, length(Y)] range

As suggested in this post, I changed "smsSparse$Negative = sms$Negative" to "smsSparse$Negative = sms$negative", but it didn't help. I aslo tried 7/10 and 0,7 instead of 0.7. Same result.

Can someone tell me why R thinks that 0.7 is not between 0 and 1?

@Frank I changed the title. http://www.inside-r.org/packages/cran/caTools/docs/sample.split is this what you mean with "be more clear where I found it"? This method of splitting data in a test and train set was also used by the teacher, so I don't see why it shouldn't work. — Joris de Jong, Mar 30 '16 at 18:33
I meant to edit that info into the question itself. I've done it following the usual pattern (with a library call). Thanks for clarifying. (I don't use that package and so can't be of help, but maybe someone else can.) — Frank, Mar 30 '16 at 18:36
@Frank okay, do you by chance know an other package/method which will also split the data into a train and test set? — Joris de Jong, Mar 30 '16 at 18:41

score 2 · Answer 1 · answered Sep 21 '18 at 08:32

2

set.seed(1000) library(caTools) split = sample.split(letters$isB, SplitRatio = 0.5)

isB should be the label of the Dependent variable, look up in your dataset that name.

Here you can find why this error is raised.

answered Sep 21 '18 at 08:32

Anant

396
4
11

justin1.618 · Accepted Answer · 2017-01-10T18:49:33.573

I have never used the function sample.split before. However, normally I partition my data without using such a function. For example, say I want to partition the iris data set into a training and testing data set and I want the training to be about 70% of the size of the original data set. Then I can do this:

data(iris)

#Create a random sample of integers sample from 1 to nrow(iris)
samp <- sample(1:nrow(iris), size=round(0.7*nrow(iris)), replace=FALSE)

train <- iris[samp,]  #Only takes rows that are in samp
test <- iris[-samp,] #Omits the rows that were in samp

The same can be done with a vector except the , is not necessary in [samp,] or in [-samp,]. I hope that helps. Otherwise, perhaps providing the first 6 entries smsSparse$sentiment might help people identify the problem.

score 0 · Answer 3 · answered Feb 24 '17 at 18:55

0

Check if smsSparse$sentiment is rightly assigned. If there is any mistake happened during cbind or any spelling mistakes, R throws an error like this.

answered Feb 24 '17 at 18:55

Manoj Subramanyam

1

When it serves the purpose, should we really need to care if it is even a tag? – Manoj Subramanyam Feb 28 '17 at 15:56

score 0 · Answer 4 · answered Dec 08 '17 at 15:30

As someone mentioned correctly, this is likely an assignment error, ex spelling error, or the column does not exist or is null, or even if the column based on which you are splitting (dependent variable) is not a factor, in which case you can convert it to one. To check quickly, you can see a summary of the smsSparse$sentiment and confirm.

score 0 · Answer 5 · answered Dec 05 '18 at 22:30

Looking at the Code of sample.split function as defined in R, you will see the following line of code

if (SplitRatio >= nSamp)
    stop("Error in sample.split: 'SplitRatio' parameter has to be i [0, 1] range or [1, length(Y)] range")

there could be 2 reasons for this error 1) the length of your data is less than the SplitRatio 2) first parameter to the split function is null.

Make sure you have data in the FirstParamter of that you are passing.

score 0 · Answer 6 · answered Aug 29 '20 at 23:45

sample.split actually works when the package caTool is installed and enabled. You can install it by

install.packages('caTools')

then enable it by

library('caTools')

After running the above lines, you can then do something like this

split = sample.split(smsSparse$sentiment, SplitRatio = 0.7)

If for instance your dataset is called dataset as an example

you can then do something like

training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)

Error in sample.split in R, 'SplitRatio' parameter has to be i [0, 1]

6 Answers6