0

I have real-time data on NO2 that was collected with an NO2 sensor. The data has 48-hour NO2 logged per second. NO2 was measured for 48 hours in different homes. There are over 50,000 rows and three columns. Here a snapshot.

Homes   Time      ppb
Home 1  12:00 AM    6
Home 1  12:01 AM    6
Home 1  12:02 AM    7
Home 1  12:03 AM    6
Home 1  12:04 AM    7
Home 1  12:05 AM    9
Home 1  12:06 AM    8
Home 2  12:00 AM    2
Home 2  12:01 AM    3
Home 2  12:02 AM    4
Home 2  12:03 AM    1
Home 2  12:04 AM    6
Home 2  12:05 AM    4
Home 2  12:06 AM    8

I want to create a new dataset which will have hourly (12 am, 1 am, etc) averages of NO2 instead of per second. I want to make 1 am to be the average of 12:01 am - 1:00 am and continue with this trend.

I tried a number of methods, but I'm yet to figure how to do this. The group_by and summarize functions gave me averages of time (e.g. 1 am, 2 am) in the entire dataset, but I want hourly averages according to the home numbers.

I tried using a loop which didn't work:

Avg = data.frame(Sensor_analysis)    
head(Avg)    
Hourly = rep(0, 48)    
for (i in 1:48) {
   Hourly[i] = mean(Avg$ppb[60*(i-1)+1:60*i])
}
Hourly
nicola
  • 24,005
  • 3
  • 35
  • 56
Ophir
  • 1
  • 4
  • You can use `cut` i guess. Also have a look [there](https://stackoverflow.com/questions/17389533/aggregate-values-of-15-minute-steps-to-values-of-hourly-steps) – user2100721 Jun 17 '17 at 04:43
  • can you provide a copy&pasteable version of your data? Just run `dput(head(Sensor_analysis))`. – RoyalTS Jun 17 '17 at 04:55
  • @RoyalTS I ran dput(head(Sensor_analysis)) in R, what do I do next to make it copy and pasteable on this platform? – Ophir Jun 18 '17 at 07:42
  • Just edit your question and paste it in there. – RoyalTS Jun 18 '17 at 07:45

1 Answers1

1

Your question is really far, far away from the ideal of a fully reproducible example but try this on for size:

library(dplyr)
library(lubridate)

Sensor_analysis %>%
    mutate(hour = ceiling_date(Time, 'hour')) %>%
    group_by(Homes, hour) %>%
    summarize(hourly = mean(ppb)) -> Hourly

Depending on how your Time variable is stored you may first have to convert it either to POSIXct or to hms for this to work

RoyalTS
  • 9,545
  • 12
  • 60
  • 101
  • Thank you @RoyalTS it worked. I want to know how the hourly averages are calculated. After I ran the code, I see that it gave hourly (e.g. 1:00, 2:00) averages. Is this the mean concentration for 1:00-1:59? I compared the results to what I did manually in Excel, but it was different. Another question, I want other variables to be in the new dataset we created. The original dataset has 10 columns (Homes, Time, ppb, Heat, Ventilation, Cook, TRAP, Dst_Hwy, Dst_Int, Trucks). – Ophir Jun 18 '17 at 07:32