1

I'm getting an "longer object length not multiple of shorter object length" warning in R when comparing two integers to subset a dataframe in the midst of a user defined function.

The user defined function just returns the median of a subset of integers taken from a dataframe:

function(s){ 
    return(median((subset(EDB,as.integer(validSession) == as.integer(s)))$absStudentDeviation))
}

(I did not originally have the as.integer coercions in there. I put them there to debug, text, and I'm still getting an error.)

The specific error I'm getting is:

In as.integer(validSession) == as.integer(s) : longer object length is not a multiple of shorter object length

I get this warning over 50 times when calling:

mediandf <- ddply(mediandf,.(validSession),
                           transform,
                           grossMed2 = medianfuncEDB(as.integer(validSession)))

The goal is to calculate the median of $validSession associated with the given validSession in the large dataframe EDB and attach that vector to mediandf.

I have actually double-checked that all values for validSession in both the mediandf dataframe and the EDB dataframe are integers by subsetting with is.integer(validSession).

Furthermore, it appears that the command actually does what I intend, I get a new column in my dataframe with values I have not verified, but I want to understand the warning. if "medianfuncEDB" is being called with an integer as its input, why am I getting a "longer object length is not multiple of shorter object length" when s == validSession is called?

Note that simple function calls, like medianfuncEDB(5) work without any problems, so why do I get warnings when using ddply?

EDIT: I found the problem with the help of Joran's comment. I did not know that transform fed entire vecotrs into the function. Using validSession[1] instead gave no warnings.

David R
  • 994
  • 1
  • 11
  • 27
  • Can you provide some sample data? – Chase Dec 19 '11 at 00:09
  • I'm commenting rather than answering, since this will be tough to address without a reproducible example. However, it is unlikely to be related to coercion (`as.integer`). Are you sure that validSession will always be exactly the same length as s? Maybe you meant to use `%in%` rather than `==`? – joran Dec 19 '11 at 00:20
  • If you use the debugging tools (http://stackoverflow.com/questions/1882734/what-is-your-favorite-r-debugging-trick/5156351#5156351), you will be able to compare what you think your data looks like to what it actually does. Specifically, try setting options(error=recover). – Ari B. Friedman Dec 19 '11 at 00:28
  • joran, I must be misunderstanding how ddply operates. I was assuming it worked row-by-row when transforming the data. The function medianfuncDB is intended to take a bare integer, not a vector of integers, so in my mind both "s" and "validSession" are integers rather than vectors when they are compared. Perhaps I'm missing something about how "transform" works here. – David R Dec 19 '11 at 01:10
  • Okay, looks like the simple error here is that I was not aware that the entire vector was being fed into my function. I am new to R and thought that the transform function worked on each row separately, so I though that "validSession" meant (the validSession value for this row) not "the entire validSession vector for this partition of the dataframe." – David R Dec 19 '11 at 01:27
  • I think changing it to validsession[1] works. (all the validSession values are the same for that segment of the dataframe.) – David R Dec 19 '11 at 01:28
  • @DavidR Feel free to add your description of your solution as an answer and then accept it. That way folks can see at a glance that this issue is resolved. – joran Dec 19 '11 at 03:29

1 Answers1

2

The ddply function already subsets your data frame by validSession. Hence transform is only fed a data frame with all the rows corresponding to a particular validSession.

That is, transform is already being fed subset(mediandf,validSession==s) for each s in unique(mediandf$validSession).

Since you don't have to do any subsetting (ddply takes care of that), all you need to do is:

ddply(mediandf,.(validSession),transform,grossMed2=median(absStudentDeviation))

And then you'll get mediandf back out with a new column grossMed2 with the value you want (so it will be the same value within each unique validSession).

mathematical.coffee
  • 55,977
  • 11
  • 154
  • 194
  • Thanks for your answer, but this won't give the correct values. I'm looking for the median of values from another dataframe (EDB rather than mediandf). I think the problem is simply that I didn't realize transform would feed the entire vector into my udf that I intended to only receive a single integer as argument. – David R Dec 19 '11 at 01:25
  • ahh, I understand now. I agree - you could try a `print(s)` in `medianfuncEDB` just to make sure they're all the same within each function call (they should be), and just so something like `as.integer(s)[1]` then. – mathematical.coffee Dec 19 '11 at 01:28