3

In survey research we often have blocks of variables starting with the same set of characters such as "Q4a", "Q4b", etc. Is it possible to avoid manually entering column names starting with "Q4" as arguments of the tab_cells function of expss package and use some sort of a selector similar to starts_with("Q4") in R package expss? I know how to do it using tidyverse, but expss offers many options that are not easily reproducible using other packages (e.g., nicely formated tables with the results of pairwise significance testing of means/column proportions).

The following code tabulates means of columns Q4a and Q4b, but required me to enter column names (Q4a, Q4b) manually instead of requesting to select all variables starting with "Q4".

# load expss library
library(expss)

# create a sample data frame
data=data.frame(Q1=c(1,2,3),
                Q4a=c(3,4,5),
                Q4b=c(6,7,8))

# tabulate means of columns starting with "Q4" (entered manually as arguments of the tab_cells function)
data %>% 
    tab_cells(Q4a,Q4b) %>%
    tab_cols(total()) %>% 
    tab_stat_fun(Mean = w_mean) %>%
    tab_pivot()
statadvice
  • 45
  • 5

1 Answers1

2

In the document of ?tab_cells, it said you can use mrset/mdset for multiple-response variables. You can search ?mrset for more details.

  • mrset_p and mdset_p: select variables for multiple-responses by perl-style regular expresssion.
data %>% 
    tab_cells(mrset_p("^Q4")) %>%    # similar to matches("^Q4") in <tidy-select>
    tab_cols(total()) %>% 
    tab_stat_fun(Mean = w_mean) %>%
    tab_pivot()
data %>% 
    tab_cells(mrset(Q4a %to% Q4b)) %>%    # similar to Q4a:Q4b in <tidy-select>
    tab_cols(total()) %>% 
    tab_stat_fun(Mean = w_mean) %>%
    tab_pivot()
Output
# |     |      | #Total |
# | --- | ---- | ------ |
# | Q4a | Mean |      4 |
# | Q4b | Mean |      7 |
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
  • 2
    For means it is better to use `..p`, e. g. `..p("^Q4")`. Result of the `mrset_*` will be considered as multiple response variable with the specific treatment in many cases.. – Gregory Demin Mar 03 '23 at 09:51
  • @gregorydemin Thank you. With means ``..p("^Q4")`` works great, but when I use ```..p("^Q4")``` and change the statistic from mean to column percent tab_stat_cpct(), no labels (Q4a and Q4b) are displayed in rows anymore. – statadvice Mar 06 '23 at 02:53
  • 1
    @statadvice Generally, `tab_stat_cpct` want variable labels. For using variable names you need to add `data %>% tab_prepend_names() %>% ... ` – Gregory Demin Mar 07 '23 at 09:05