1

I've already read this question with an approach to counting entries in R:

how to realize countifs function (excel) in R

I'm looking for a similar approach, except that I want to count data that is within a given range.

For example, let's say I have this dataset:

data <- data.frame( values = c(1,1.2,1.5,1.7,1.7,2))

Following the approach on the linked question, we would develop something like this:

count <- data$values == 1.5
sum(count)

Problem is, I want to be able to include in the count anything that varies 0.2 from 1.5 - that is, all possible number from 1.3 to 1.7.

Is there a way to do so?

Eric Lino
  • 429
  • 4
  • 10

3 Answers3

4
sum(data$values>=1.3 & data$values<=1.7)

As the explanation in the question you linked to points out, when you just write out a boolean condition, it generates a vector of TRUEs and FALSEs the same length as your original dataframe. TRUE equals 1 and FALSE equals 0, so summing across it gives you a count. So it simply becomes a matter of putting your condition as a boolean phrase. In the case of more than one condition, you connect them with & or | (or) -- much the same way that you could do in excel (only in excel you have to do AND() or OR()).

(For a more general solution, you can use dplyr::between - it's also supposed to be faster since it's implemented in C++. In this case, it would be sum(between(data$values,1.3,1.7).)

iod
  • 7,412
  • 2
  • 17
  • 36
1

Like @doviod writes, you can use a compound logical condition.
My approach is different, I wrote a function that takes the vector and as range the center point value and the distance delta.

After a suggestion by @doviod, I have set a default value delta = 0, so that if only value is passed, the function returns

a count of cases where the values equal the value the user provides.
(doviod, in the comment)

countif <- function(x, value, delta = 0) 
  sum(value - delta <= x & x <= value + delta)

data <- data.frame( values = c(1,1.2,1.5,1.7,1.7,2))

countif(data$values, 1.5, 0.2)
#[1] 3
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • 1
    I would make delta default to 0 (`delta=0`), in which case if you only give the value, you get a count of cases where the values equal the value you provide. – iod Aug 03 '18 at 18:38
1

which identifies the location of all values in your vector that satisfy your criterion, and length subsequently counts the 'hits'.

length( which(data$values>=1.3 & data$values<=1.7) )
[1] 3
milan
  • 4,782
  • 2
  • 21
  • 39
  • For some reason this approach isn't working properly. I have a vector with 814 observations, but this function returns more than 2k hits and I'm not even sure how this is possible. The way I'm implementing this logic is: `cm <- as.data.frame(matrix(c(0,0,0,0), nrow = 2))` `cm[1,1] <- length(which(subset(results, results$org>0 ) >= results$prev - mean(results$dist) & subset(results, results$org>0 ) <= results$prev + mean(results$dist)))` – Eric Lino Aug 05 '18 at 16:58
  • Not sure. If class('your data') is a vector; this should work. Make sure it's not a factor. Perhaps you could update the answer with an example when it does not work. Then we could see what else it needs. – milan Aug 05 '18 at 19:52