How to do a sumif/countif formula in R?

Question

With the following data:

> dput(head(smart1,16))
structure(list(propertyID = c(233213.22, 233213.22, 233213.22, 
233213.22, 233213.22, 233213.22, 233213.22, 233213.22, 233213.22, 
233213.22, 233216.55, 233216.55, 233216.55, 233216.55, 233216.55, 
233216.55), UptodateTarget = c(1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 
0, 0, 1, 0, 1, 1), hourUTC = c(20, 14, 14, 18, 14, 18, 14, 15, 
15, 21, 15, 20, 15, 14, 19, 21)), row.names = 26:41, class = "data.frame")

> data.frame(smart1)
   propertyID UptodateTarget hourUTC
26   233213.2              1      20
27   233213.2              0      14
28   233213.2              1      14
29   233213.2              0      18
30   233213.2              0      14
31   233213.2              0      18
32   233213.2              0      14
33   233213.2              0      15
34   233213.2              1      15
35   233213.2              0      21
36   233216.5              0      15
37   233216.5              0      20
38   233216.5              1      15
39   233216.5              0      14
40   233216.5              1      19
41   233216.5              1      21

I am trying to add a third column, "probability" that creates this output:

> data.frame(smart1)
   propertyID UptodateTarget hourUTC probability
26   233213.2              1      20           1
27   233213.2              0      14        0.25
28   233213.2              1      14        0.25
29   233213.2              0      18           0
30   233213.2              0      14        0.25
31   233213.2              0      18           0
32   233213.2              0      14        0.25
33   233213.2              0      15         0.5
34   233213.2              1      15         0.5
35   233213.2              0      21           0
36   233216.5              0      15         0.5
37   233216.5              0      20           0
38   233216.5              1      15         0.5
39   233216.5              0      14           0
40   233216.5              1      19           1
41   233216.5              1      21           1

I want the probability column to calculate the probability of a propertyID having an UptodateTarget in a given hourUTC. It should sum the number of UptodateTarget for each propertyID in each hourUTC divided by the count of all UptodateTarget for each propertyID in each hourUTC

The sum divided by the number is the mean, or average. You want to calculate the average for each group, where the grouping is defined by `propertyID` and `hourUTC`. Lots of ways to do this, you've got a couple answer below, check out the linked Mean by Group FAQ for many more. — Gregor Thomas, Oct 31 '19 at 14:36

JDG · Accepted Answer · 2019-10-31T14:46:13.617

Using data.table:

Code

dt = smart1; require(data.table); setDT(dt)

dt[, Probability := mean(UptodateTarget), .(propertyID, hourUTC)]

(Edit: using mean(x) instead of sum(x)/length(x) alike nghauran's dplyr solution is of course more elegant)

Result

> dt
    propertyID UptodateTarget hourUTC Probability
 1:   233213.2              1      20        1.00
 2:   233213.2              0      14        0.25
 3:   233213.2              1      14        0.25
 4:   233213.2              0      18        0.00
 5:   233213.2              0      14        0.25
 6:   233213.2              0      18        0.00
 7:   233213.2              0      14        0.25
 8:   233213.2              0      15        0.50
 9:   233213.2              1      15        0.50
10:   233213.2              0      21        0.00
11:   233216.5              0      15        0.50
12:   233216.5              0      20        0.00
13:   233216.5              1      15        0.50
14:   233216.5              0      14        0.00
15:   233216.5              1      19        1.00
16:   233216.5              1      21        1.00

score 2 · Answer 2 · answered Oct 31 '19 at 14:33

With dplyr:

smart1 %>% group_by(propertyID, hourUTC) %>% mutate(Probability = mean(UptodateTarget))
# A tibble: 16 x 4
# Groups:   propertyID, hourUTC [10]
   propertyID UptodateTarget hourUTC Probability
        <dbl>          <dbl>   <dbl>       <dbl>
 1    233213.              1      20        1   
 2    233213.              0      14        0.25
 3    233213.              1      14        0.25
 4    233213.              0      18        0   
 5    233213.              0      14        0.25
 6    233213.              0      18        0   
 7    233213.              0      14        0.25
 8    233213.              0      15        0.5 
 9    233213.              1      15        0.5 
10    233213.              0      21        0   
11    233217.              0      15        0.5 
12    233217.              0      20        0   
13    233217.              1      15        0.5 
14    233217.              0      14        0   
15    233217.              1      19        1   
16    233217.              1      21        1

How to do a sumif/countif formula in R?

2 Answers2