0

I have a dataframe with 3 columns- start_time, end_time, energy where start_time and end_time are date time format and energy is energy spent between these two time. ![enter image description here][1]

My goal is to calculate the energy spent each day. Instances for which the start_time and end_time have same date, the energy value would be allocated to that date. But I need to find a way to classify energy values that have a different dates for start_time and end_time. For example a instance in dataframe like this-

start_time             end_time               energy
2014-06-09 20:54:10    2014-06-11 05:04:14    1114

Should produce instances like these in output data frame-

date        energy
2014-06-09  <energy consumed between 2014-06-09 20:54:10 to 2014-06-09 23:59:59>
2014-06-10  <energy consumed between 2014-06-10 00:00:00 to 2014-06-10 23:59:59>
2014-06-11  <energy consumed between 2014-06-11 00:00:00 to 2014-06-11 05:04:14>
Zeeshan
  • 1,248
  • 1
  • 12
  • 19
  • So... explain what you've tried? – hd1 Jul 20 '15 at 20:57
  • I am new to R, so tried a very naive approach, I took the **start_time** and used the **ceiling_date(x, 'days')** function to find the end of the day date time. Then added a offset of 1 second to it to start the next day. This was done until the date of **start_time** and **end_time** are same. This splitted the time stamps into days. I then found the proportion of **energy** to be allocated to each day depending on the time. Wanted to know if there was a better way (library) to handle this task as its some thing very basic. Thanks – Zeeshan Jul 20 '15 at 21:12
  • You've only given one example row. Is it safe to assume that subsequent rows don't overlap? For instance could row 2 start on 2014-06-10? If so how should that look? – Dean MacGregor Jul 20 '15 at 21:51
  • Yes its safe to assume that the time do not overlap. They may be on the same date but the hour or minute or second would be different – Zeeshan Jul 20 '15 at 21:56
  • http://stackoverflow.com/questions/13912282/subset-xts-object-by-time-of-day hope that helps, Zeeshan – hd1 Jul 20 '15 at 23:31

1 Answers1

0

I've not tested it much (the supplied dataframe was a bit sparse..) , but this seems to work okay.

calcEnergy <- function(startCol, endCol, valCol) {
    require(chron)
    # calculate start and finish times
    chron.fun <- function(x) chron(x[1], x[2], format=c('y-m-d','h:m:s'))
    starts <- unlist(lapply(strsplit(as.character(startCol), " "), chron.fun))
    ends <- unlist(lapply(strsplit(as.character(endCol), " "), chron.fun))
    # need to expand dataframe out to accomodate new rows, so calculate number of 
    # rows per original observation
    nrows <- ceiling(ends) - floor(starts)
    # ..& create expanded dataframe based on this
    df.out <- data.frame(start_time = rep(starts, nrows) + sequence(nrows)-1,
                       end_time = rep.int(ends, nrows) - (rep(nrows,nrows) -sequence(nrows)),
                       valCol = rep.int(valCol, nrows),
                       tDiffs = rep.int(ends - starts, nrows))
    # identify non-original starts and finishes (which are unique)
    startIndex <- !df.out$start_time %in% starts
    endIndex <- !df.out$end_time %in% ends
    # floor or ceiling accordingly
    df.out$start_time[startIndex] <- floor(df.out$start_time[startIndex])
    df.out$end_time[endIndex] <- ceiling(df.out$end_time[endIndex])
    # calculate proportion energy per day
    df.out$energy <- with(df.out, valCol*(end_time-start_time)/tDiffs)
    # reformat cols
    df.out$date <- chron(floor(df.out$start_time), out.format='y-m-d')
    df.out[c("date", "energy")]
}
Simon Mills
  • 188
  • 9