I'm using dplyr and Hmisc to prepare a table of weighted statistics by group as per the R code below.
require(Hmisc) # weighted statistcs
StTbl <- iris %>%
group_by(Species) %>% # Group species
summarise(n = n(), # number of records
WtMn = wtd.mean(Sepal.Length, Petal.Width), # weighted mean
WtSd = sqrt(wtd.var(Sepal.Length, Petal.Width)), # weighted SD
WtCV = WtMn/WtSd, # weighted CV
Minm = min(Sepal.Length), # minumum
Wp05 = wtd.quantile(Sepal.Length, Petal.Width , 0.05), # p05
Wp50 = wtd.quantile(Sepal.Length, Petal.Width , 0.50), # p50
Wp95 = wtd.quantile(Sepal.Length, Petal.Width , 0.95), # p95
Wp975 = wtd.quantile(Sepal.Length, Petal.Width , 0.975), # p975
Wp99 = wtd.quantile(Sepal.Length, Petal.Width , 0.99), # p99
Maxm = max(Sepal.Length) # maximum
)
StTbl
A tibble: 3 x 12
Species n WtMn WtSd WtCV Minm Wp05 Wp50 Wp95 Wp975 Wp99 Maxm
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 setosa 50 5.05 0.356 14.2 4.3 4.61 5.06 5.62 5.70 5.72 5.8
2 versicolor 50 5.98 0.508 11.8 4.9 5.13 6 6.80 6.97 7 7
3 virginica 50 6.61 0.626 10.6 4.9 5.8 6.5 7.7 7.7 7.9 7.9
Now rather than use the column names of the table I wish to use a column index so I can loop through a number of columns preparing the statistics tables for or each column. I've found there are a number of suggestions on how to do this on StackOverflow including:
- double square or single brackets using the table name and index number, for example substituting ".[1]" or "iris1" instead of "Sepal.Length" in the code above - these suggestions run without errors but return NA results
- Use the get function such as "get(iris1)" - this suggestion returns a invalid first argument error
- The suggestion that dplyr does not really support column index and that column index is a bad idea and I should tackle the problem another way - I'm not sure what another 'tidyverse' way would this be?
- Write a custom function - here I'm not sure where to start with this for my example