Count number of columns by a condition (>) for each row

Question

I am trying to work out for each row of a matrix how many columns have values greater than a specified value. I am sorry that I am asking this simple question but I wasn't able to figure it out.

I have extracted maximum temperature values from a raster stack, of multiple years of rasters, for some spatial points I am interested in. The data looks similar to:

data <- cbind('1990' = c(25, 22, 35, 42, 44), '1991' = c(23, 28, 33, 40, 45), '1992' = c(20, 20, 30, 41, 43))

    1990   1991   1992
1     25     23     20
2     22     28     20
3     35     33     30
4     42     40     41
5     44     45     43

I want to end up with the number of years that the temperature was above 30 for each location, eg.:

    yr.above   
1          0
2          0
3          2
4          3
5          3

I have tried a few things but they didn't work and were pretty illogical (e.g. trying length(data[1:length(data), which(blah blah doesn't make sense)), or apply(data, 1, length(data) > 30), I know these don't make sense but I am a bit stuck.

flodel · Accepted Answer · 2013-09-18T01:03:44.170

40

This will give you the vector you are looking for:

rowSums(data > 30)

It will work whether data is a matrix or a data.frame. Also, it uses vectorized functions, hence is a preferred approach over using apply which is little more than a (slow) for loop.

If data is a data.frame, you can add the result as a column by doing:

data$yr.above <- rowSums(data > 30)

or if data is a matrix:

data <- cbind(data, yr.above = rowSums(data > 30))

You can also create a whole new data.frame:

data.frame(yr.above = rowSums(data > 30))

or a whole new matrix:

cbind(yr.above = rowSums(data > 30))

edited Sep 18 '13 at 01:03

answered Sep 18 '13 at 00:49

flodel

87,577
21
185
223

+1, though note that `data` in the op's example is a `matrix` not a `data.frame` – thelatemail Sep 18 '13 at 00:56
Thanks. It is hard to tell: `cbind` does give a matrix, but the printed data in the question suggests a `data.frame`. I have edited to address both possibilities. – flodel Sep 18 '13 at 01:04
Perfect! Thanks flodel. I purposefully didn't look at rowSums because I thought it would give me a sum of all the values above 30. In fact I have been using rowSums to get the summed value of my rows for a different variable... Live and learn. Cheers – Adam Sep 18 '13 at 03:25
2

You are very welcome. The idea is that `data > 30` returns a matrix of TRUE and FALSE. When you apply `rowSums` on that matrix, the TRUE and FALSE are converted into 1 and 0 respectively. – flodel Sep 18 '13 at 03:28

score 6 · Answer 2 · answered Sep 18 '13 at 00:47

6

The third argument of apply needs to be a function. Also, you can count logical trues with sum.

apply(data, 1, function(x)sum(x > 30))

answered Sep 18 '13 at 00:47

mengeln

331
1
3

2

Also, `apply(data>30,1,sum)` ! – Frank Sep 18 '13 at 03:10

score 3 · Answer 3 · answered Sep 07 '16 at 10:16

3

We can also do with Reduce and + (assuming there are no NA elements)

 Reduce(`+`, lapply(as.data.frame(data), `>`, 30))

This should be efficient as we are not converting to a matrix.

answered Sep 07 '16 at 10:16

akrun

874,273
37
540
662

Darren Tsai · Answer 4 · 2022-08-19T07:22:23.727

With dplyr package, you can try the following two solutions.

library(dplyr)
df <- as.data.frame(data)

Options 1

df %>%
  mutate(yr.above = rowSums(across(`1990`:`1992`) > 30))

Options 2

After dplyr 1.0.0, you can use c_across() with rowwise() to make it easy to perform row-wise aggregations.

df %>%
  rowwise() %>%
  mutate(yr.above = sum(c_across(`1990`:`1992`) > 30)) %>%
  ungroup()

Note: One of the benefits for using dplyr is the support of tidy selections, which provide a concise dialect of R for selecting variables based on their names or properties.

Output

# # A tibble: 5 x 4
#   `1990` `1991` `1992` yr.above
#    <dbl>  <dbl>  <dbl>    <int>
# 1     25     23     20        0
# 2     22     28     20        0
# 3     35     33     30        2
# 4     42     40     41        3
# 5     44     45     43        3

Count number of columns by a condition (>) for each row

4 Answers4

Linked

Related