2

This should be simple but I just can't get the apply to communicate with my vectorised function.

Test data is: df <- data.frame(a = 1:3, b1 = c(4:5, NA), b2 = c(5,6,5)) Looks like this:

  a b1 b2
1 1  4  5
2 2  5  6
3 3 NA  5

Custom function checks returns a vector to indicate whether values fall in a given interval.

validScore <- function(x, a, b) {
  is.na(x) == FALSE &
  x%%1 == 0 &
  findInterval(x, c(a,b), rightmost.closed = TRUE) == 1
}

Test of custom function: validScore(c(3, 3.5, 6, NA), 1, 5) returns the logical vector TRUE FALSE FALSE FALSE as expected.

I want to run the custom function on the row defined by the columns b1 and b2. This would return TRUE FALSE FALSE (that is T on (b1=4,b2=5), F on (b1=5,b2=6) and F on (b1=NA,b2=5)).

The answer Call apply-like function on each row of dataframe with multiple arguments from each row for selecting the columns, and how to apply a function to every row of a matrix (or a data frame) in R together suggest the following:

library(dplyr)
apply(select(df, b1:b2), 1, function(x) validScore(x, 1, 5))

but that doesn't actually send the row to the function, instead assessing each value individually, so output is:

   [,1]  [,2]  [,3]
b1 TRUE  TRUE FALSE
b2 TRUE FALSE  TRUE

Sticking a rowwise() into the middle like select(df, b1:b2) %>% rowwise() %>% apply(1, function(x) validScore(x, 1, 5)) makes no difference.

I thought it might by something to do with the form that the dplyr select returned, but apply(df[, c("b1", "b2")], 1, function(x) validScore(x, 1, 5)) also generates the same result.

Community
  • 1
  • 1
JenB
  • 17,620
  • 2
  • 17
  • 45
  • this function works on one x at a time so you will need to combine them after or you will get a matrix like you show, use `all` or `rowSums`/`colSums`: `rowSums(Vectorize(validScore)(df[, -1], 1, 5)) > 1` – rawr Sep 03 '15 at 11:52
  • yes @rawr, you (and @csgillespie) are correct. I am using the validScore function in two different ways and that is the conflict. I originally wrote it to identify which scores were invalid. – JenB Sep 03 '15 at 12:12

1 Answers1

2

You don't need dplyr or plyr. You can just use base R.

The first thing to do is to make validScore return only a single TRUE or FALSE. This can be done using the all function

validScore <- function(x, a, b) {
  test = is.na(x) == FALSE &
    x %% 1 == 0 &
    findInterval(x, c(a,b), rightmost.closed = TRUE) == 1
  all(test)
}

After that just use the standard apply

## Select columns 2 & 3 
apply(df[, 2:3], 1, validScore, a=1, b=8)
csgillespie
  • 59,189
  • 14
  • 150
  • 185
  • thanks, yes. Your solution has made me realise that I may be using validScore in two (incompatible) ways. – JenB Sep 03 '15 at 12:14
  • There's still something wrong here. Testing with `apply(df[, 2:3], 1, validScore, a=1, b=5)` gets `FALSE FALSE FALSE` but should be TFF – JenB Sep 03 '15 at 12:25
  • Sorry, I made a change to validScore Works now – csgillespie Sep 03 '15 at 12:27
  • Does indeed. And it also worked when I made the adjustments to the real code in light of my incompatible function problem. – JenB Sep 03 '15 at 12:35