0

I have data set in following format:

STATION     CODE  DATE     HOUR hr_rain
SHIVAMOGGA  163 06/09/18    00   1.0
SHIVAMOGGA  163 06/09/18    04   1.0
SHIVAMOGGA  163 06/09/18    05   NA
SHIVAMOGGA  163 06/09/18    06   1.5
SHIVAMOGGA  163 06/09/18    07   2.5
SHIVAMOGGA  163 06/09/18    08   NA
SHIVAMOGGA  163 06/09/18    09   0.0
SHIVAMOGGA  163 06/09/18    10   0.5
SHIVAMOGGA  163 06/09/18    11   0.5
SHIVAMOGGA  163 06/09/18    12   NA
SHIVAMOGGA  163 06/09/18    13   NA
SHIVAMOGGA  163 06/09/18    14   0.5
SHIVAMOGGA  163 06/09/18    15   0.5
SHIVAMOGGA  163 06/09/18    16   0.5
SHIVAMOGGA  163 06/09/18    17   0.5
SHIVAMOGGA  163 06/09/18    18   0.5
SHIVAMOGGA  163 06/09/18    19   0.5
SHIVAMOGGA  163 06/10/19    03   0.5
SHIVAMOGGA  163 06/10/19    05   NA
SHIVAMOGGA  163 06/10/19    06   NA
SHIVAMOGGA  163 06/10/19    07   NA
SHIVAMOGGA  163 06/10/19    08   0.5
SHIVAMOGGA  163 06/10/19    09   0.0
SHIVAMOGGA  163 06/10/19    10   0.0

Here the parameter rainfall is in hourly accumulated format. I'm interested into hourly rainfall rate. The measurement starts everyday at 09 HOUR & sometimes observations are missing, so I tried to fill NA values (3 or more consecutive NA's will be left unchanged and consecutive NA's less than 2 in number are replaced & NA's at 8 HOUR has given the previous value) by grouping for 09 HOUR.

df1 <- df %>% 
  group_by(STATION, CODE, gr = cumsum(HOUR == '09')) %>% 
  mutate(hr_rain = na.approx(hr_Rain, rule = 2, maxgap = 2, na.rm = FALSE))

Again to calculate hourly rainfall rate I tried to group df1 with:

hourly_df <- df1 %>% 
  group_by(STATION, CODE , grp = cumsum(HOUR == '09')) %>% 
  mutate(RAINFALL = hr_rain - lag(hr_rain, default = 0))

But it's not working. It creates first group and then second group continues till end of dataframe. The result is coming like:

STATION     CODE  DATE     HOUR hr_rain  NUM_NA  gp  grp  RAINFALL
SHIVAMOGGA  163 06/09/18    00   1.0       2      0   0     1
SHIVAMOGGA  163 06/09/18    04   1.0       2      0   0     0
SHIVAMOGGA  163 06/09/18    05   1.25      1      0   0     0.25
SHIVAMOGGA  163 06/09/18    06   1.5       1      0   0     0.25
SHIVAMOGGA  163 06/09/18    07   2.5       1      0   0     1
SHIVAMOGGA  163 06/09/18    08   2.5       1      0   0     0
SHIVAMOGGA  163 06/09/18    09   0.0       1      1   1     -2.5
SHIVAMOGGA  163 06/09/18    10   0.5       2      1   1     0.5
SHIVAMOGGA  163 06/09/18    11   0.5       2      1   1     0
SHIVAMOGGA  163 06/09/18    12   0.5       2      1   1     0
SHIVAMOGGA  163 06/09/18    13   0.5       2      1   1     0
SHIVAMOGGA  163 06/09/18    14   0.5       7      1   1     0
SHIVAMOGGA  163 06/09/18    15   0.5       7      1   1     0
SHIVAMOGGA  163 06/09/18    16   0.5       7      1   1     0
SHIVAMOGGA  163 06/09/18    17   0.5       7      1   1     0
SHIVAMOGGA  163 06/09/18    18   0.5       7      1   1     0
SHIVAMOGGA  163 06/09/18    19   0.5       7      1   1     0
SHIVAMOGGA  163 06/10/19    03   0.5       7      1   1     0
SHIVAMOGGA  163 06/10/19    05   NA        3      1   1     NA
SHIVAMOGGA  163 06/10/19    06   NA        3      1   1     NA
SHIVAMOGGA  163 06/10/19    07   NA        3      1   1     NA
SHIVAMOGGA  163 06/10/19    08   0.5       1      1   1     0.5
SHIVAMOGGA  163 06/10/19    09   0.0       2      2   1     -0.5
SHIVAMOGGA  163 06/10/19    10   0.0       2      2   1     0

With this for 9 HOUR I'm getting negative values, which I want to start with value of hr_rain (that's why I was trying to create another grouping by 09 HOUR) for that field. Thanks in advance for any help!

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
Ajay
  • 320
  • 2
  • 11
  • 1
    You might need to `ungroup()` before grouping again? – neilfws Jan 30 '20 at 03:22
  • @neilfws with ```ungroup()``` it's creating desire groups but it taking difference of two fields continuously. It should restart from 09 HOUR again but it's not happening. – Ajay Jan 30 '20 at 04:23
  • 1
    If the grouping variable is created correctly, but `mutate` doesn't seem to be working by group, then you probably loaded `plyr` after `dplyr`, ignored the warning, and are inadvertently using `plyr::mutate` when you need `dplyr::mutate`. [See this FAQ](https://stackoverflow.com/q/26106146/903061), and maybe try specifying `dplyr::mutate` – Gregor Thomas Jan 30 '20 at 04:30
  • @Gregor-reinstateMonica I have already loaded ```plyr``` first and then loaded ```dplyr``` – Ajay Jan 30 '20 at 04:59

1 Answers1

1

Since both the groups are same no need to calculate them differently, you can combine them and calculate hr_rain and RAINFALL together.

library(dplyr)

df %>% 
  group_by(STATION, CODE, gr = cumsum(HOUR == '09')) %>% 
  mutate(hr_rain = zoo::na.approx(hr_rain, rule = 2, maxgap = 2, na.rm = FALSE), 
         RAINFALL = hr_rain - lag(hr_rain, default = 0)) 

data

df <- structure(list(STATION = c("SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", "SHIVAMOGGA", 
"SHIVAMOGGA"), CODE = c(163, 163, 163, 163, 163, 163, 163, 163, 
163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 163, 
163, 163, 163), DATE = c("06/09/18", "06/09/18", "06/09/18", 
"06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", 
"06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", "06/09/18", 
"06/09/18", "06/09/18", "06/10/19", "06/10/19", "06/10/19", "06/10/19", 
"06/10/19", "06/10/19", "06/10/19"), HOUR = c("00", "04", "05", 
"06", "07", "08", "09", "10", "11", "12", "13", "14", "15", "16", 
"17", "18", "19", "03", "05", "06", "07", "08", "09", "10"), 
hr_rain = c(1, 1, NA, 1.5, 2.5, NA, 0, 0.5, 0.5, NA, NA, 
0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, NA, NA, NA, 0.5, 0, 0)), row.names = c(NA, 
-24L), class = "data.frame")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • It's not working as expected. Still it's doing subtraction of 9 HOUR and 8 HOUR for 9 HOUR which leads to negative value for 9 HOUR. – Ajay Jan 30 '20 at 04:11
  • Further I did first grouping so that instead of assigning interpolated value to NA of 8 HOUR it should took previous available (7 HOUR) value for NA in 8 HOUR but it's not working in that way. ```na.approx()``` is interpolating for 8 HOUR also. – Ajay Jan 30 '20 at 04:17
  • @Ajay It gives me 0 for all the values at 9 hour and all the groups change at 9 hour as well in `gr`. Can you check with the data I have posted? Do you data different than that? – Ronak Shah Jan 30 '20 at 04:26
  • with your data also I'm getting same output. It's giving result of subtraction of 9 HOUR and 8 HOUR. ```STATION CODE DATE HOUR hr_rain gr RAINFALL 5 SHIVAMOGGA 163 06/09/18 07 2.5 0 1 6 SHIVAMOGGA 163 06/09/18 08 1.25 0 -1.25 7 SHIVAMOGGA 163 06/09/18 09 0 1 -1.25 8 SHIVAMOGGA 163 06/09/18 10 0.5 1 0.5 9 SHIVAMOGGA 163 06/09/18 11 0.5 1 0 10 SHIVAMOGGA 163 06/09/18 12 0.5 1 0 # …with 14 more rows``` – Ajay Jan 30 '20 at 05:10
  • 1
    @Ajay yes, in that case there is a conflict as Gregor mentioned, use `dplyr::mutate` instead of just `mutate`. – Ronak Shah Jan 30 '20 at 05:12
  • Yes, It worked with ```dplyr::mutate```. But I loaded first ```plyr```and then loaded ```dplyr``` then still why this is happening. – Ajay Jan 30 '20 at 05:17
  • When there are functions with same name from different packages it is always good to refer them as `packagename::function_name`. – Ronak Shah Jan 30 '20 at 05:18