0

I have searched various R aggregation questions here on stackoverflow (e.g Aggregating hourly data into daily aggregates) but none addresses multivariate table in long format.

My table is hourly table of observed and modeled values (for a whole year), for each site (up to eight) as shown below:

date    obs mod site
2017-01-01 00:00:00 1.2 -0.7    Carib
2017-01-01 01:00:00 3.1 -0.9    Carib
2017-01-01 02:00:00 2.1 -0.3    Carib
..
..
2017-02-17 10:00:00 2   1.5 Halley
2017-02-17 11:00:00 2.7 1.8 Halley
2017-02-17 12:00:00 3   2.2 Halley
..
..
2017-03-13 13:00:00 5.6 5.6 Yules
2017-03-13 14:00:00 6.5 5.0 Yules
2017-03-13 15:00:00 7.5 4.6 Yules

Below is the result I would like to have (excluding missing data)

date max_obs    max_mod mean_obs mean_mod  site
2017-01-01 -0.7 3.1 -0.9 0.9 Carib
2017-01-02 0.2 -1.5 -0.3 0.5 Carib
..
..
2017-02-17 2.2 1.5 1.1 0.8 Halley
2017-02-18 1.6 1.9 1.2 0.9 Halley
..
..
2017-03-13 5.6 5.2 4.7 5.0 Yules
2017-03-14 5.0 5.2 4.9 5.2 Yules
..

I imported my data as table, and tried daily averaging with date <- as.Date(DT$date,"%Y-%m-%d") but not getting what I want. Any help will be appreciated.

Gautam
  • 2,597
  • 1
  • 28
  • 51
Chike
  • 21
  • 1

1 Answers1

0

I'm not sure what do you mean with the "table": whether you are using data.table package or just read.table() function of the base R. I'll give an answer which works for both approaches but is suboptimal if you are using data.table.

The most convenient way to solve your problem is to use packages of the tidyverse family: dplyr for data manipulation and lubridate for transformation of the dates, like this

library(dplyr)
library(lubridate)

res_df <- obs_df %>%
    mutate(calend_day = as_date(date)) %>%
    group_by(calend_day, site) %>%
    summarize(
        max_obs = max(obs),
        max_mod = max(mod),
        mean_obs = mean(obs),
        mean_mod = mean(mod)
        ) %>%
    # set a desired order of columns
    select(calend_day, max_obs, max_mod, mean_obs, mean_mod, site)

Type of the date column is supposed to by a character, so stringsAsFactors = FALSE should be set when using read.table().

Hope, it helps :)

Ekatef
  • 1,061
  • 1
  • 9
  • 12