0

I'm trying to average RAIN according to HOUR. Data consists rainfall recorded for 24 hours at more than 1000 stations. Each HOUR has 4 recordings but somewhere it varies to 1, 2 or 3. I have to average RAIN of each HOUR for each STATION. Sample data is like :

STN,     HOBLINAME,   LATI,      LONG_,    RAINDATE, HOUR,  RAIN
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  0,    3.5
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  0,    3
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  0,    3
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  0,    2.5
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  1,    0
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  1,    1
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  1,    2
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  2,    0
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  2,    0
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  2,    0
4471,   Adagal (GP), 15.952089, 75.673282, 14-08-17,  2,    0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  0,   7.5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  1,   7
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  1,   6.5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  2,   6
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  2,   6
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  2,   5.5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  2,   5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  21,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  21,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  21,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  21,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  22,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  22,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  22,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  22,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  23,   0
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  23,   2
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  23,   2.5
804,    BADAMI,      15.919473, 75.683335, 14-08-17,  23,   3

I tried with :

copy14   <- read.csv("/home/14copy.csv")
aggregate( RAIN ~ HOUR, copy14, FUN = mean )

but it's not giving average for all particular hour of all stations together (like 0 hour of all stations averaged together). What I want is average of each hour for each station separately i.e. here for station 4471 RAIN must be averaged separately and for station 804 separately. At last how should I write this final average with all its associated fields.

Mohan Singh
  • 1,142
  • 3
  • 15
  • 30
Ajay
  • 320
  • 2
  • 11
  • Please share the output of `dput(head(copy14))` – JDG Nov 14 '19 at 08:57
  • structure(list(STN = c(4471L, 4471L, 4471L, 4471L, 4471L, 4471L ), HOBLINAME = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c(" BADAMI", " Adagal (GP)"), class = "factor"), LATI = c(15.952089, 15.952089, 15.952089, 15.952089, 15.952089, 15.952089), LONG_ = c(75.673282, 75.673282, 75.673282, 75.673282, 75.673282, 75.673282), RAINDATE = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = " 14-08-17", class = "factor"), HOUR = c(0L, 0L, 0L, 0L, 1L, 1L), RAIN = c(3.5, 3, 3, 2.5, 0, 1)), row.names = c(NA, 6L), class = "data.frame") – Ajay Nov 14 '19 at 09:04
  • `aggregate( RAIN ~ STN + HOUR, copy14, FUN = mean )` – Ronak Shah Nov 14 '19 at 09:07

3 Answers3

1

Using data.table:

require(data.table); setDT(copy14)

copy14[, .(MeanRain = mean(RAIN)), .(STN, HOUR)]
JDG
  • 1,342
  • 8
  • 18
0

Using dplyr library we simply group and summarise like this:

library(dplyr)
copy14 <- read.csv("rain.csv")
copy14 %>%
group_by(HOUR, STN) %>%
summarise(RAIN = mean(RAIN))
thomas
  • 381
  • 2
  • 7
Fnguyen
  • 1,159
  • 10
  • 23
0

To proceed also with your first try to use aggregate I give this solution. aggregate asks for a list or dataframe in the by argument, which is then applied to the given data. In my point of view the group_by plus summarise is a smoother solution. Nevertheless also this solution should be shown here.

library(dplyr)


copy14 <- read.csv("R/rain.csv")

data <- copy14 %>%
  aggregate(by = copy14 %>%
              select(STN, HOUR),
            FUN=mean)
thomas
  • 381
  • 2
  • 7