1

I have a data.frame in which I want to perform a count by row versus a specified criterion. The part I cannot figure out is that I want a different count criterion for each row.

Say I have 10 rows, I want 10 different criteria for the 10 rows.

I tried: count.above <- rowSums(Data > rate), where rate is a vector with the 10 criterion, but R used only the first as the criterion for the whole frame.

I imagine I could split my frame into 10 vectors and perform this task, but I thought there would be some simple way to do this without resorting to that.

zx8754
  • 52,746
  • 12
  • 114
  • 209
JimO
  • 11
  • 4
  • Welcome to Stack Overflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – zx8754 Oct 21 '16 at 08:37

2 Answers2

1

Edit: this depends whether you want to operate over rows or columns. See below:

This is a job for mapply and Reduce. Suppose you have a data frame along the lines of

df1 <- data.frame(a=1:10,b=2:11,c=3:12)

Let's say we want to count the rows where a>6, b>3 and c>5. This is done with mapply:

mapply(">",df1,c(6,3,5),SIMPLIFY=FALSE)

$a
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

$b
[1] FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

$c
[1] FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE

Now we use Reduce to find those which are all TRUE:

Reduce("&",mapply(">",df1,c(6,3,5),SIMPLIFY=FALSE))
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

Lastly, we use sum to add them all up:

sum(Reduce("&",mapply(">",df1,c(6,3,5),SIMPLIFY=FALSE)))
[1] 4

If you want a result for each row rather than a global aggregate, then apply is the function to use:

apply(df1,1,function(v) sum(v>c(6,3,5)))
[1] 0 0 1 2 2 2 3 3 3 3
JDL
  • 1,496
  • 10
  • 18
  • 1
    I think they want one variable to filter over one row and sum, another var to filter next row and sum, etc. Result should be 10 numbers, one for each row. – zx8754 Oct 21 '16 at 08:32
  • 1
    Oh okay, in that case it's probably `apply(df1,1,function(v) sum(v>thresholds))` where `thresholds` is the vector of thresholds. I was solving a more difficult problem :) – JDL Oct 21 '16 at 08:42
0

Given the dummy data (from @zx8754s solution)

# dummy data
df1 <- data.frame(matrix(1:15, nrow = 3))

myRate <- c(7, 5, 1)

Solution using apply

Courtesy of @JDL

rowSums(apply(df1, 2, function(v) v > myRate))

Alternative solution using the Reduce pattern

Reduce(function(l, v) cbind(l[,1] + (l[,2] > myRate), l[,-2:-1]),
       1:ncol(df1),
       cbind(0, df1))
stephematician
  • 844
  • 6
  • 17
  • @JDL i don't have enough rep to comment yet, but do you mean `rowSums(apply(df1,2,function(v) v>thresholds))`? – stephematician Oct 21 '16 at 09:15
  • No, I don't; the second argument to apply should definitely be `1` because we want one answer for each row. Your version is calculating one answer per column. – JDL Oct 21 '16 at 10:15
  • I think we're answering different questions, the OP seems to suggest that each row has a separate constraint. apply(,2,) will apply the "column" of constraints (one for each row), and then rowSums will do the rest. – stephematician Oct 21 '16 at 11:37
  • Fair enough; the OP isn't the clearest. Hopefully we have given them the info they need! – JDL Oct 21 '16 at 11:38
  • 1
    I have not tried all of the suggestions, but rowSums(apply(df1, 2, function(v) v > myRate)) is doing precisely what I needed. Thanks everyone for the assistance. – JimO Oct 21 '16 at 15:29