-1

can someone pleas help me? I have a dataframe with 649 different locations and each with 11088 observations from the last 30 years. 1 hydologic year spans from sep. 1 to aug. 31. The datafram looks like this:

enter image description here

What I want to end up with is something like this:

enter image description here

In my original dataframe I also have a lot of data missing. If a location (i.e. 1.50.0 ) is missing more than 10% data in one hydrological year I do not want to keep that year in my new dataframe.

If my question is unclear pleas ask. :)

Wimpel
  • 26,031
  • 1
  • 20
  • 37
IJH
  • 1
  • 1
  • welcome to SO. Please do not post images of your data! use `dput` to crate a data structure, and then post it here. Also, read: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Wimpel Oct 08 '18 at 09:57

1 Answers1

0

Without data it's not easy, but it may be something like that

  df<-data.frame(d1=c(rnorm(9,5,2),NA),
                    d2=rnorm(10,15,2))
     row.names(df)<-c(seq(today()-days(9),today(),"day"))

     df%>%
       rownames_to_column("id")%>%
       gather(variable,value,-id)%>%
       mutate(yr=year(id))%>%
       group_by(yr)%>%
       mutate(is_na=sum(is.na(value))/n())%>%
       filter(is_na<.1)%>%
       group_by(yr,variable)%>%
       summarise(res=mean(value,na.rm=T))%>%
       spread(variable,res)
    # A tibble: 1 x 3
    # Groups:   yr [1]
         yr    d1    d2
      <dbl> <dbl> <dbl>
    1 2018.  4.41  14.7
jyjek
  • 2,627
  • 11
  • 23
  • Thank you, @jyjek! If instead of filtering out the values where is_na <0.1, I would just make the corresponding values == 0. Any suggestions? – IJH Oct 12 '18 at 09:49
  • @IJH then you will filter only years without `NA` – jyjek Oct 12 '18 at 09:52