0

With the following data:

> dput(head(smart1,16))
structure(list(propertyID = c(233213.22, 233213.22, 233213.22, 
233213.22, 233213.22, 233213.22, 233213.22, 233213.22, 233213.22, 
233213.22, 233216.55, 233216.55, 233216.55, 233216.55, 233216.55, 
233216.55), UptodateTarget = c(1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 
0, 0, 1, 0, 1, 1), hourUTC = c(20, 14, 14, 18, 14, 18, 14, 15, 
15, 21, 15, 20, 15, 14, 19, 21)), row.names = 26:41, class = "data.frame")

> data.frame(smart1)
   propertyID UptodateTarget hourUTC
26   233213.2              1      20
27   233213.2              0      14
28   233213.2              1      14
29   233213.2              0      18
30   233213.2              0      14
31   233213.2              0      18
32   233213.2              0      14
33   233213.2              0      15
34   233213.2              1      15
35   233213.2              0      21
36   233216.5              0      15
37   233216.5              0      20
38   233216.5              1      15
39   233216.5              0      14
40   233216.5              1      19
41   233216.5              1      21

I am trying to add a third column, "probability" that creates this output:

> data.frame(smart1)
   propertyID UptodateTarget hourUTC probability
26   233213.2              1      20           1
27   233213.2              0      14        0.25
28   233213.2              1      14        0.25
29   233213.2              0      18           0
30   233213.2              0      14        0.25
31   233213.2              0      18           0
32   233213.2              0      14        0.25
33   233213.2              0      15         0.5
34   233213.2              1      15         0.5
35   233213.2              0      21           0
36   233216.5              0      15         0.5
37   233216.5              0      20           0
38   233216.5              1      15         0.5
39   233216.5              0      14           0
40   233216.5              1      19           1
41   233216.5              1      21           1

I want the probability column to calculate the probability of a propertyID having an UptodateTarget in a given hourUTC. It should sum the number of UptodateTarget for each propertyID in each hourUTC divided by the count of all UptodateTarget for each propertyID in each hourUTC

Ranalytictsnewb
  • 105
  • 1
  • 1
  • 12
  • The sum divided by the number is the mean, or average. You want to calculate the average for each group, where the grouping is defined by `propertyID` and `hourUTC`. Lots of ways to do this, you've got a couple answer below, check out the linked Mean by Group FAQ for many more. – Gregor Thomas Oct 31 '19 at 14:36

2 Answers2

2

Using data.table:

Code

dt = smart1; require(data.table); setDT(dt)

dt[, Probability := mean(UptodateTarget), .(propertyID, hourUTC)]

(Edit: using mean(x) instead of sum(x)/length(x) alike nghauran's dplyr solution is of course more elegant)

Result

> dt
    propertyID UptodateTarget hourUTC Probability
 1:   233213.2              1      20        1.00
 2:   233213.2              0      14        0.25
 3:   233213.2              1      14        0.25
 4:   233213.2              0      18        0.00
 5:   233213.2              0      14        0.25
 6:   233213.2              0      18        0.00
 7:   233213.2              0      14        0.25
 8:   233213.2              0      15        0.50
 9:   233213.2              1      15        0.50
10:   233213.2              0      21        0.00
11:   233216.5              0      15        0.50
12:   233216.5              0      20        0.00
13:   233216.5              1      15        0.50
14:   233216.5              0      14        0.00
15:   233216.5              1      19        1.00
16:   233216.5              1      21        1.00
JDG
  • 1,342
  • 8
  • 18
2

With dplyr:

smart1 %>% group_by(propertyID, hourUTC) %>% mutate(Probability = mean(UptodateTarget))
# A tibble: 16 x 4
# Groups:   propertyID, hourUTC [10]
   propertyID UptodateTarget hourUTC Probability
        <dbl>          <dbl>   <dbl>       <dbl>
 1    233213.              1      20        1   
 2    233213.              0      14        0.25
 3    233213.              1      14        0.25
 4    233213.              0      18        0   
 5    233213.              0      14        0.25
 6    233213.              0      18        0   
 7    233213.              0      14        0.25
 8    233213.              0      15        0.5 
 9    233213.              1      15        0.5 
10    233213.              0      21        0   
11    233217.              0      15        0.5 
12    233217.              0      20        0   
13    233217.              1      15        0.5 
14    233217.              0      14        0   
15    233217.              1      19        1   
16    233217.              1      21        1   
nghauran
  • 6,648
  • 2
  • 20
  • 29