1

I need to analyse negative or positive text messages, and find out which words define a positive or negative text. At this point, I need to split the data between a test set and a training set. However, this happens:

library(caTools)
split = sample.split(smsSparse$sentiment, SplitRatio = .7)
# Error in sample.split(smsSparse$sentiment, SplitRatio = 0.7) : 
#   Error in sample.split: 'SplitRatio' parameter has to be i [0, 1] range or [1, length(Y)] range

As suggested in this post, I changed "smsSparse$Negative = sms$Negative" to "smsSparse$Negative = sms$negative", but it didn't help. I aslo tried 7/10 and 0,7 instead of 0.7. Same result.

Can someone tell me why R thinks that 0.7 is not between 0 and 1?

Community
  • 1
  • 1
Joris de Jong
  • 13
  • 1
  • 4
  • @Frank I changed the title. http://www.inside-r.org/packages/cran/caTools/docs/sample.split is this what you mean with "be more clear where I found it"? This method of splitting data in a test and train set was also used by the teacher, so I don't see why it shouldn't work. – Joris de Jong Mar 30 '16 at 18:33
  • I meant to edit that info into the question itself. I've done it following the usual pattern (with a library call). Thanks for clarifying. (I don't use that package and so can't be of help, but maybe someone else can.) – Frank Mar 30 '16 at 18:36
  • @Frank okay, do you by chance know an other package/method which will also split the data into a train and test set? – Joris de Jong Mar 30 '16 at 18:41

6 Answers6

2

set.seed(1000) library(caTools) split = sample.split(letters$isB, SplitRatio = 0.5)

isB should be the label of the Dependent variable, look up in your dataset that name.

Here you can find why this error is raised.

Anant
  • 396
  • 4
  • 11
1

I have never used the function sample.split before. However, normally I partition my data without using such a function. For example, say I want to partition the iris data set into a training and testing data set and I want the training to be about 70% of the size of the original data set. Then I can do this:

data(iris)

#Create a random sample of integers sample from 1 to nrow(iris)
samp <- sample(1:nrow(iris), size=round(0.7*nrow(iris)), replace=FALSE)

train <- iris[samp,]  #Only takes rows that are in samp
test <- iris[-samp,] #Omits the rows that were in samp

The same can be done with a vector except the , is not necessary in [samp,] or in [-samp,]. I hope that helps. Otherwise, perhaps providing the first 6 entries smsSparse$sentiment might help people identify the problem.

justin1.618
  • 691
  • 5
  • 15
0

Check if smsSparse$sentiment is rightly assigned. If there is any mistake happened during cbind or any spelling mistakes, R throws an error like this.

0

As someone mentioned correctly, this is likely an assignment error, ex spelling error, or the column does not exist or is null, or even if the column based on which you are splitting (dependent variable) is not a factor, in which case you can convert it to one. To check quickly, you can see a summary of the smsSparse$sentiment and confirm.

Aditya Arora
  • 126
  • 4
0

Looking at the Code of sample.split function as defined in R, you will see the following line of code

if (SplitRatio >= nSamp)
    stop("Error in sample.split: 'SplitRatio' parameter has to be i [0, 1] range or [1, length(Y)] range")

there could be 2 reasons for this error 1) the length of your data is less than the SplitRatio 2) first parameter to the split function is null.

Make sure you have data in the FirstParamter of that you are passing.

0

sample.split actually works when the package caTool is installed and enabled. You can install it by

install.packages('caTools')

then enable it by

library('caTools')

After running the above lines, you can then do something like this

split = sample.split(smsSparse$sentiment, SplitRatio = 0.7)

If for instance your dataset is called dataset as an example

you can then do something like

training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
Anthony
  • 79
  • 4