0

I have a table of several faults similar to one below;

fault_ID    date_reported    date_closed         fault hours
ab1        8/04/2019 1:36    8/04/2019 3:30       1.9
ab2        12/03/2019 18:48  12/03/2019 20:45     1.9
ab3        19/10/2018 22:00  20/10/2018 9:45      11.8
ab4        20/03/2019 10:07  20/03/2019 13:00     2.9
ab5        30/07/2019 8:37   30/07/2019 16:30     7.9
ab6        15/06/2019 12:58  18/06/2019 19:55     78.9
ab7        8/06/2019 22:30   8/06/2019 23:30      1.0
ab8        16/10/2018 0:45   16/10/2018 1:50      1.1
ab9        23/01/2019 1:44   23/01/2019 7:05      5.3
ab10       22/05/2019 6:19   26/07/2019 23:07     1,576.8

using R, I would like to summarise the fault hours per month to get the output below:

   Month    Total Fault hours 
   Oct-18       1.1 
   Nov-18       11.8 
   Dec-18       0.0 
   Jan-19       0.0
   Feb-19       0.0 
   Mar-19       2.0 
   Apr-19       1.9 
   May-19       210.0 
   Jun-19       78.9 
   Jul-19       7.9

I have tried a few suggestions with lubridate but I only seem to be able to get it to work for one fault_ID at a time.. I'm new to R. Thanks in advance for your help.

CLK
  • 17
  • 3
  • Welcome to SO. when you provide data like this, we cannot read that easily into R to help you with your specific problem. Better use `dput()` to provide a dump of your data. See also [how to make a great R example](https://stackoverflow.com/q/5963269/3250126). – loki Nov 22 '19 at 06:58

1 Answers1

1

It seems there is some miscalculation in the expected output but you can do this with tidyverse. First convert the columns into date-time, group_by first day of each month and sum fault.hours and the missing months in the middle can be filled using complete.

library(tidyverse)
library(lubridate)

df %>%
  mutate_at(vars(starts_with("date")), dmy_hm) %>%
  mutate(date = as.Date(format(date_reported, "%Y-%m-01")),
         #If you really have values with comma
         fault.hours = readr::parse_number(as.character(fault.hours))) %>%
  group_by(date) %>%
  summarise(hours = sum(fault.hours)) %>%
  complete(date = seq(min(date), max(date), by = "1 month"),
                  fill = list(hours = 0)) %>%
  mutate(date = format(date, "%b-%Y"))


# A tibble: 10 x 2
#   date      hours
#   <chr>     <dbl>
# 1 Oct-2018   12.9
# 2 Nov-2018    0  
# 3 Dec-2018    0  
# 4 Jan-2019    5.3
# 5 Feb-2019    0  
# 6 Mar-2019    4.8
# 7 Apr-2019    1.9
# 8 May-2019 1577. 
# 9 Jun-2019   79.9
#10 Jul-2019    7.9

data

df <- structure(list(fault_ID = structure(c(1L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L, 2L), .Label = c("ab1", "ab10", "ab2", "ab3", "ab4", 
"ab5", "ab6", "ab7", "ab8", "ab9"), class = "factor"), date_reported = 
structure(c(9L, 1L, 4L, 5L, 8L, 2L, 10L, 3L, 7L, 6L), 
.Label = c("12/03/2019 18:48", "15/06/2019 12:58", "16/10/2018 0:45", 
"19/10/2018 22:00", "20/03/2019 10:07", "22/05/2019 6:19", "23/01/2019 1:44", 
"30/07/2019 8:37", "8/04/2019 1:36", "8/06/2019 22:30"), class = "factor"), 
date_closed = structure(c(9L, 1L, 5L, 4L, 8L, 3L, 10L, 2L, 6L, 7L), 
.Label = c("12/03/2019 20:45", "16/10/2018 1:50", "18/06/2019 19:55", 
"20/03/2019 13:00", "20/10/2018 9:45", "23/01/2019 7:05", "26/07/2019 23:07", 
"30/07/2019 16:30", "8/04/2019 3:30", "8/06/2019 23:30"), class = "factor"), 
fault.hours = structure(c(4L, 4L, 5L, 6L, 8L, 9L, 2L, 3L, 7L, 1L),
.Label = c("1,576.8", "1.0", "1.1", "1.9", "11.8", "2.9", "5.3", "7.9", "78.9"),
class = "factor")), class = "data.frame", row.names = c(NA, -10L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213