Retain values in dataframe that do not meet certain conditions

Question

I have daily precipitation data from 1880-2011. The data is in a df called STATION and takes the form:

STATION: 47486 obs. of 4 variables
  Year: int 1880 1880 ...
  Month: int 1 1 1 ...
  Day: int 1 2 3 ...
  PPT: num 0.4 0 0 ...

I have used the following to group the data into mean monthly precipitation amounts when daily precipitation exceeds 0.2mm:

MONTHLY.MEAN=STATION %>% group_by(Year,Month) %>%
filter(PPT>=0.2)%>%summarise(s = mean(PPT))

This works fine but there is one month in the record (April 2007) where there were no days at all above 0.2mm and therefore this month was cut out of the output file. I want it to be included as zero even if it doesn't meet the criteria I set in the formula. Can this be done?

I hope this makes sense.

Please read the info on how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). — Jaap, May 14 '16 at 12:23

score 1 · Accepted Answer · answered May 14 '16 at 13:41

Using dplyr:

MONTHLY.MEAN=STATION %>% group_by(Year,Month) %>%summarise(s = mean(PPT[PPT>=0.2]))

A Possible Solution using data.table:

library(data.table)

setDT(STATION)

STATION[,mean(PPT[PPT >= 0.2]),by=c('Month','Year')]

In both the codes , the months that don't have the PPT values > 0.2mm will be included as NaN. You can easily convert them to zero.

score 0 · Answer 2 · answered May 14 '16 at 14:29

Consider row binding filtered aggregates:

MONTHLY.MEAN <- rbind(
                  STATION %>% group_by(Year,Month) %>%
                               filter(PPT>=0.2) %>% summarise(s = mean(PPT)),
                  STATION %>% group_by(Year,Month) %>%
                               filter(max(PPT)<0.2) %>% summarise(s =  0)
                 )

# RE-ORDER DATA FRAME
MONTHLY.MEAN <- MONTHLY.MEAN[with(MONTHLY.MEAN, order(Year, Month)),]

Retain values in dataframe that do not meet certain conditions

2 Answers2