Summing each hydrologic year in my dataframe at 649 locations and 11,088 observations

Question

can someone pleas help me? I have a dataframe with 649 different locations and each with 11088 observations from the last 30 years. 1 hydologic year spans from sep. 1 to aug. 31. The datafram looks like this:

enter image description here

What I want to end up with is something like this:

enter image description here

In my original dataframe I also have a lot of data missing. If a location (i.e. 1.50.0 ) is missing more than 10% data in one hydrological year I do not want to keep that year in my new dataframe.

If my question is unclear pleas ask. :)

welcome to SO. Please do not post images of your data! use `dput` to crate a data structure, and then post it here. Also, read: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Wimpel, Oct 08 '18 at 09:57

score 0 · Answer 1 · answered Oct 08 '18 at 11:00

Without data it's not easy, but it may be something like that

  df<-data.frame(d1=c(rnorm(9,5,2),NA),
                    d2=rnorm(10,15,2))
     row.names(df)<-c(seq(today()-days(9),today(),"day"))

     df%>%
       rownames_to_column("id")%>%
       gather(variable,value,-id)%>%
       mutate(yr=year(id))%>%
       group_by(yr)%>%
       mutate(is_na=sum(is.na(value))/n())%>%
       filter(is_na<.1)%>%
       group_by(yr,variable)%>%
       summarise(res=mean(value,na.rm=T))%>%
       spread(variable,res)
    # A tibble: 1 x 3
    # Groups:   yr [1]
         yr    d1    d2
      <dbl> <dbl> <dbl>
    1 2018.  4.41  14.7

Thank you, @jyjek! If instead of filtering out the values where is_na <0.1, I would just make the corresponding values == 0. Any suggestions? — IJH, Oct 12 '18 at 09:49

Summing each hydrologic year in my dataframe at 649 locations and 11,088 observations

1 Answers1