0

I'm having problem with calculating the average for 2 variables in my data. I have temperature and speed collected as shown below. I want to minimize the data and have hourly readings only (without minute). I want to take the average of the temperature and the speed for each id based on the time and the day.

ID    temp  Speed   Day    Hour    Minute    Latitude    Longitude
1      3      20    1      11      10      38.9294865  -77.2479055
1      5      25    1      11      30      38.9294865  -77.2479055
1      5      30    1      12      12      38.9294865  -77.2479055
1      6      20    1      12      40      38.9294865  -77.2479055
2      1      40    2      11      05      38.9294771  -77.2478712
2      5      30    2      11      50      38.9294771  -77.2478712
2      2      20    2      12      30      38.9294771  -77.2478712
2      8      10    2      12      40      38.9294771  -77.2478712

My desired data looks like this:

ID    temp  Speed   Day    Hour    Minute    Latitude    Longitude
1      4      22.5   1      11      00      38.9294865  -77.2479055
1      5.5    25     1      12      00      38.9294865  -77.2479055
2      3      30     2      11      00      38.9294771  -77.2478712
2      5      15     2      12      00      38.9294771  -77.2478712

I thought about creating a column with the hour and the minute like this :

Data$HM <- as.date(with(Data, paste(Hour, Minute ,sep=":")), "%H:%M")

And then based on the new column I was thinking to try this code:

AvrgData<- aggregate(Data[, 2:3], list(Data$HM), mean)

However my code is not correct. Any suggestions will be much appreciated! I went through these links but still not helping with the results I want.

How to average columns based on ID in R?

Merge three different columns into a date in R

Thank you

Sandy
  • 1,100
  • 10
  • 18
Reta
  • 363
  • 3
  • 4
  • 15
  • 1
    For starters, `as.date` is not a function - try `as.Date`. After you've done that, check the results of `Data$HM` - Dates in R do not store hours and minutes. Try `as.POSIXct` instead to get a date/time. Or don't worry about the date conversion at all, just use the `paste`. – thelatemail Nov 19 '19 at 00:38

1 Answers1

2

With aggregate you can get mean of temp and Speed grouped by ID, Day, Hour, Latitude and Longitude and create a column for Minute with value as 0.

transform(aggregate(cbind(temp, Speed)~ID + Day + Hour + Latitude + Longitude, 
          df, mean), Minute = 0)

#  ID Day Hour Latitude Longitude temp Speed Minute
#1  1   1   11    38.93    -77.25  4.0  22.5      0
#2  1   1   12    38.93    -77.25  5.5  25.0      0
#3  2   2   11    38.93    -77.25  3.0  35.0      0
#4  2   2   12    38.93    -77.25  5.0  15.0      0

The same logic implemented in dplyr, would be

library(dplyr)

df %>%
  group_by(ID, Day, Hour, Latitude, Longitude) %>%
  summarise_at(vars(temp, Speed), mean) %>%
  mutate(Minute = 0)

data

df <- structure(list(ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), temp = c(3L, 
5L, 5L, 6L, 1L, 5L, 2L, 8L), Speed = c(20L, 25L, 30L, 20L, 40L, 
30L, 20L, 10L), Day = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), Hour = c(11L, 
11L, 12L, 12L, 11L, 11L, 12L, 12L), Minute = c(10L, 30L, 12L, 
40L, 5L, 50L, 30L, 40L), Latitude = c(38.9294865, 38.9294865, 
38.9294865, 38.9294865, 38.9294771, 38.9294771, 38.9294771, 38.9294771
), Longitude = c(-77.2479055, -77.2479055, -77.2479055, -77.2479055, 
-77.2478712, -77.2478712, -77.2478712, -77.2478712)), class = "data.frame", 
row.names = c(NA, -8L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213