0

I have this data.frame:

                 Time    a    b    c     d
1 2015-01-01 00:00:00  863 1051 1899 25385
2 2015-01-01 01:00:00  920 1009 1658 24382
3 2015-01-01 02:00:00 1164  973 1371 22734
4 2015-01-01 03:00:00 1503  949  779 21286
5 2015-01-01 04:00:00 1826  953  720 20264
6 2015-01-01 05:00:00 2109  952  743 19905
...
                    Time   a   b    c     d
8756 2015-12-31 19:00:00   0 775 4957 28812
8757 2015-12-31 20:00:00   0 783 5615 29568
8758 2015-12-31 21:00:00   0 790 4838 28653
8759 2015-12-31 22:00:00   0 766 3841 27078
8760 2015-12-31 23:00:00  72 729 2179 24565
8761 2016-01-01 00:00:00 290 710 1612 23311

It represents every hour of every day for a year. I would like to extract one line per day, as a function of the maximum value of d. So at the end I want to obtain a data.frame of 365x5.

I have tried all the propositions from :Extract the maximum value within each group in a dataframe and also:Daily minimum values in R but it still doesn't work.

May be it could come from the way I proceed to generate my time serie?

library(lubridate)
start <- dmy_hms("1 Jan 2015 00:00:00")
end <- dmy_hms("01 Jan 2016 00:00:00")
time <- as.data.frame(seq(start, end, by="hours"))

Thanks for help!

  • Are these `mydf` and `df` different objects?. If it is a data.table, use `setDT(mydf)[, .(d = max(d)), by = .(Day = as.Date(Time))]` – akrun Dec 18 '18 at 02:04

3 Answers3

3

If we are aggregating by the 'Day', convert the 'Time' column to Date class stripping off the Time attributes, grouped by those, get the max of 'd'. In the OP's post, the syntax for data.table involves mydf and df. Assuming these are the same, we need

library(data.table)
setDT(mydf)[, .(d = max(d)), by = .(Day = as.Date(Time))]

Or using aggregate from base R

aggregate(d ~ Day, transform(mydf, Day = as.Date(Time)), FUN = max)

Or with tidyverse

library(tidyverse)
mydf %>%
   group_by(Day = as.Date(Time)) %>%
   summarise(d = max(d))

NOTE: Based on the OP's comments, columns 'a' to 'd' are factor class. We need to convert it to numeric either at the beginning or convert it during the processing stage

mydf$d <- as.numeric(as.character(mydf$d)))

For multiple columns

mydf[c('a', 'b', 'c', 'd')] <- lapply(mydf[c('a', 'b', 'c', 'd'), function(x)
       as.numeric(as.character(x)))

data

mydf <- structure(list(Time = c("2015-01-01 00:00:00", "2015-01-01      01:00:00", 
 "2015-01-01 02:00:00", "2015-01-01 03:00:00", "2015-01-01 04:00:00", 
 "2015-01-01 05:00:00"), a = c(863L, 920L, 1164L, 1503L, 1826L, 
 2109L), b = c(1051L, 1009L, 973L, 949L, 953L, 952L), c = c(1899L, 
 1658L, 1371L, 779L, 720L, 743L), d = c(25385L, 24382L, 22734L, 
 21286L, 20264L, 19905L)), class = "data.frame", row.names = c("1", 
  "2", "3", "4", "5", "6"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • for both it returns `Error in Summary.factor(c(2144L, 1669L, 947L, 392L, 164L, 95L, 111L, 185L, : ‘max’ not meaningful for factors` – Sebastien_H Dec 18 '18 at 02:17
  • @Sebastien_H returns what? – akrun Dec 18 '18 at 02:18
  • @Sebastien_H The description in your question is `one that contains the maximum value of d of the day.` – akrun Dec 18 '18 at 02:20
  • @Sebastien_H I assumed that you the columns 'a' to 'd' are `numeric`. There must be some issue while creating the data.frame. You can do one thing before doing the aggregation. ie. `mydf[-1] <- lapply(mydf[-1], function(x) as.numeric(as.character(x)))` – akrun Dec 18 '18 at 02:22
  • `Error in summarise_impl(.data, dots) : Evaluation error: ‘max’ not meaningful for factors.` – Sebastien_H Dec 18 '18 at 02:23
0

'max' doesn't work with factors. Hence convert the column (in your case, its column d) for which you are finding the maximum into double using as.numeric Assuming your data set is in a data frame

mydf$d = as.numeric(mydf$d)
noob
  • 9
  • 4
0

Thanks for your help! Finally I choose

do.call(rbind, lapply(split(test,test$time), function(x) {return(x[which.max(x$d),])}))

which allows me to have a 365x5 data.frame. All your propositions were right. I just needed to change my time serie like

time <- as.data.frame(rep(c(1:365), each = 24))
test<- cbind.data.frame(time, df, timebis)

which allows me to have a 365x5 data.frame. All your propositions were right. I just needed to change my time serie.