1

Using the Kendall Seasonal Trend Test from the EnvStats package to assess water quality data, but I am having trouble adjusting the code to:

  1. return trend statistics for individual sites/stations (only thing that has worked so far is subsetting each site/variable combination)

  2. consolidate the output into a list or other compact form

The test itself is simple:

kendallSeasonalTrendTest(WQconc ~ Season + Year, data = ____) 

With numerous variables and monitoring sites combinations to assess for , there must be a better way. It seems like the apply family may be relevant here, but here is what hasn't worked so far:

 data.frame':   761 obs. of  13 variables:
 $ Year        : int  2007 2007 2007 2007 2008 2008 2008 2008 2009 2009 ...
 $ Quarter     : Factor w/ 4 levels "Fall","Spring",..: 4 2 3 1 4 2 3 1 4 2 ...
 $ Date        : Factor w/ 319 levels "1/10/2007","1/10/2013",..: 1 177 285 40 5 183 261 43 6 184 ...
 $ Site_ID     : Factor w/ 24 levels "s01541000","s01541500",..: 17 17 17 17 17 17 17 17 17 17 ...
 $ DO          : num  7.57 6.44 6.44 7.79 8.68 8.5 8.21 9.91 9.55 11.7 ...
 $ Flow        : num  33259 8912 2002 803 27680 ...
 $ Iron        : num  196 108 103 126 136 77 17 214 90 8 ...
 $ Magnesium   : num  205 309 75 92 190 285 71 98 320 263 ...
 $ pH          : num  6.75 7.4 8 7.8 6.1 7.4 7.7 7.8 7.6 6.8 ...
 $ Temperature : num  3.5 5.7 27.7 20.1 5.5 9.7 23.8 17.6 0.1 8.7 ...
 $ TDS         : num  163 15 80 92 178 23 77 93 35 33 ...
 $ TP          : num  1 1 1 8 12 7 16 2 8 63 ...

 kendallSeasonalTrendTest(DO ~ Quarter + Year, data =water1)  

But that doesn't work exactly, returns test statistics for the entire dataset rather than by Site ID. Would be idea if the results were in a list form

I tried transitions of the apply family, but I really don't understand them yet either. Appreciate any thoughts!

www
  • 38,575
  • 12
  • 48
  • 84
Sam
  • 21
  • 1

1 Answers1

1

Simply use lapply which returns a list of kendallSeasonalTrendTest() results equal to the length of input vector/list, specifically distinct Site_IDs retrieved using unique(). Two subset types are shown below.

And then run lapply across each variable. Also, use setNames() to name each list element for easy reference (e.g., DO_results$s01541000)

# DO Results
DO_results <- lapply(unique(water1$Site_ID), function(s) 
                kendallSeasonalTrendTest(DO ~ Quarter + Year, 
                                         data = subset(water1, Site_ID == s)))
# NAMING EACH LIST ELEMENT BY CORRESPONDING SITE_ID
DO_results <- setNames(DO_results, unique(water1$Site_ID))

# Flow Results
Flow_results <- lapply(unique(water1$Site_ID), function(s) 
                  kendallSeasonalTrendTest(Flow ~ Quarter + Year, 
                                           data = water1[water1$Site_ID == s,]))  
# NAMING EACH LIST ELEMENT BY CORRESPONDING SITE_ID
Flow_results <- setNames(Flow_results, unique(water1$Site_ID))
...
Parfait
  • 104,375
  • 17
  • 94
  • 125