Sum and Count Changes per Group for each Column in R

Question

I have seen examples of using lapply across columns, but not one that considers (1) a timestamp (2) groups based on timestamp (3) detects when values change

I'm looking for a way to do the follow for an arbitrary number of Panels per Sensor (there could be Panel 3, Panel 4, etc.):

for each year, month, hour, I'm looking for the sum and Counts Turn On which is a count of the # times a value changes from 0 to a non zero number. To simplify, non-zero values at the start of the hour should not be counted towards this value (even if the previous value is 0).

take the df:

cols <- c("Timestamp","1000 Sensor 2 Panel 1","1000 Sensor 2 Panel 2")
tstmp <- seq(as.POSIXct("2018-08-13 00:00:00", tz="US/Eastern"), 
             as.POSIXct("2018-08-13 03:30:00", tz="US/Eastern"), 
             by="15 min") %>% as.data.frame()
stage1 <- c(rep(c(0,.7,1),5)) %>% as.data.frame() 
stage2 <- c(0,1,rep(c(0,.5),5),0,1,1) %>% as.data.frame()

df = cbind(tstmp,stage1,stage2)
colnames(df) = cols

I'd like the result to be results_1:

ID                      Year    Month   Hour    Sum     Count Turn On
1000 Sensor 2 Panel 1   2018        8   0       1.7         1
1000 Sensor 2 Panel 1   2018        8   1       2.4         1
1000 Sensor 2 Panel 1   2018        8   2       2.7         1
1000 Sensor 2 Panel 1   2018        8   3       1.7         1
1000 Sensor 2 Panel 2   2018        8   0       1.5         2
1000 Sensor 2 Panel 2   2018        8   1       1           2
1000 Sensor 2 Panel 2   2018        8   2       1           2
1000 Sensor 2 Panel 2   2018        8   3       2           1

For those more ambitious, I'd like to see a solution that is able to determine whether the last reading in the previous hour was 0 and the first reading in the next hour is non-zero, and is able to count that towards Count Turns On -- the solution would look like below in results_advanced:

ID                      Year    Month   Hour    Sum     Count Turn On
1000 Sensor 2 Panel 1   2018        8   0       1.7         1
1000 Sensor 2 Panel 1   2018        8   1       2.4         2
1000 Sensor 2 Panel 1   2018        8   2       2.7         1
1000 Sensor 2 Panel 1   2018        8   3       1.7         1
1000 Sensor 2 Panel 2   2018        8   0       1.5         2
1000 Sensor 2 Panel 2   2018        8   1       1           2
1000 Sensor 2 Panel 2   2018        8   2       1           2
1000 Sensor 2 Panel 2   2018        8   3       2           1

I'd like a solution for at least results_1, but would appreciate solutions for both results_1 and results_advanced. Please provide any detail you can as to your thought process and this will help me (and others) learn more.

I believe there are both data.table and dplyr solutions so I'll tag both.

score 1 · Accepted Answer · answered Dec 14 '18 at 00:13

This is the tidyverse approach to the first problem. Hopefully you can use this to approach the second part of your question.

First we convert the data from wide to long with gather from tidyr. I also remove the Timestamp variable, but that's optional.

library(lubridate); library(tidyverse)

df_long <- df %>% 
  gather(ID, Val, -Timestamp)
head(df_long)
            Timestamp                    ID Val
1 2018-08-13 00:00:00 1000 Sensor 2 Panel 1 0.0
2 2018-08-13 00:15:00 1000 Sensor 2 Panel 1 0.7
3 2018-08-13 00:30:00 1000 Sensor 2 Panel 1 1.0
4 2018-08-13 00:45:00 1000 Sensor 2 Panel 1 0.0
5 2018-08-13 01:00:00 1000 Sensor 2 Panel 1 0.7
6 2018-08-13 01:15:00 1000 Sensor 2 Panel 1 1.0

df_long <- df_long %>% 
  mutate(Year = year(Timestamp),
         Month = month(Timestamp),
         Hour = hour(Timestamp)) %>% 
  select(-Timestamp)

I then calculate the number of times turned on using dplyr::group_by and dplyr::lag, which allows you to access the previous value.

df_long <- df_long %>% 
  group_by(ID, Year, Month, Hour) %>% 
  mutate(Turned = ifelse(lag(Val) == 0 & Val != 0, 1, 0))

Then just use dplyr::summarise to calculate the final values. Note it is the group_by statement in this part is redundant, since we already grouped, but I keep it in there for clarity.

df_long %>% 
  group_by(ID, Year, Month, Hour) %>% 
  summarise(Sum = sum(Val),
            NTurned = sum(Turned, na.rm = T))

  ID                     Year Month  Hour   Sum NTurned
  <chr>                 <dbl> <dbl> <int> <dbl>   <dbl>
1 1000 Sensor 2 Panel 1  2018     8     0   1.7       1
2 1000 Sensor 2 Panel 1  2018     8     1   2.4       1
3 1000 Sensor 2 Panel 1  2018     8     2   2.7       1
4 1000 Sensor 2 Panel 1  2018     8     3   1.7       1
5 1000 Sensor 2 Panel 2  2018     8     0   1.5       2
6 1000 Sensor 2 Panel 2  2018     8     1   1         2
7 1000 Sensor 2 Panel 2  2018     8     2   1         2
8 1000 Sensor 2 Panel 2  2018     8     3   2         1

Sum and Count Changes per Group for each Column in R

1 Answers1