0

I have a dataset with two column: one is the measured time, the other is value. Because there are many measurement methods, "several" values can occur in one specific date. The data below is one example:

ex_data <-
    data.frame(
        measuretime = c(
            "1994/1/6 15:00",
            "1994/1/7 15:00",
            "1994/1/7 15:00",
            "1994/1/8 15:00"),
        value = c(2.4, 1.8, 2.0, 1.4)
    )

In this example of "ex_data", "1994/1/7 15:00" has two records: 1.8 and 2.0. I want to use their mean value (i.e., 1.9) to replace them. After doing so, every measuretime has one single record. That is:

 measuretime    Value
1994/1/6 15:00  2.4
1994/1/7 15:00  1.9
1994/1/8 15:00  1.4

Here is what I have tried. I have standardized the time:

library(lubridate) 
ex_data$measuretime <- ymd_hm(ex_data$measuretime) 

However, I don't know how to locate these "same date" records and replace them with a mean value?

T X
  • 505
  • 1
  • 9
  • 19
  • 1
    `aggregate(value~measuretime, ex_data, mean)` or with `dplyr` `ex_data %>% group_by(measuretime) %>% summarise(value = mean(value))` – Ronak Shah Feb 11 '19 at 13:56
  • 1
    Use `setDT(ex_data)[, .(value = mean(value)), by = measuretime]` – akrun Feb 11 '19 at 13:57
  • 1
    I don't know why I cannot post this as an answer. I wanted to reply: `library(purrrlyr)` `ex_data %>% slice_rows("measuretime") %>% by_slice(dmap, mean, .collate="rows")` – Stéphane Laurent Feb 11 '19 at 14:01

1 Answers1

1

Using sqldf:

library(sqldf)
sqldf("select measuretime,avg(value) from ex_data group by measuretime")

OR

plyr::ddply(ex_data,~measuretime,summarise,mean=mean(value))

Output:

        measuretime avg(value)
 1 1994/1/6 15:00        2.4
 2 1994/1/7 15:00        1.9
 3 1994/1/8 15:00        1.4
Saurabh Chauhan
  • 3,161
  • 2
  • 19
  • 46