How to select row/column variables starting with a particular set of characters (e.g., Q4) in R package expss?

Question

In survey research we often have blocks of variables starting with the same set of characters such as "Q4a", "Q4b", etc. Is it possible to avoid manually entering column names starting with "Q4" as arguments of the tab_cells function of expss package and use some sort of a selector similar to starts_with("Q4") in R package expss? I know how to do it using tidyverse, but expss offers many options that are not easily reproducible using other packages (e.g., nicely formated tables with the results of pairwise significance testing of means/column proportions).

The following code tabulates means of columns Q4a and Q4b, but required me to enter column names (Q4a, Q4b) manually instead of requesting to select all variables starting with "Q4".

# load expss library
library(expss)

# create a sample data frame
data=data.frame(Q1=c(1,2,3),
                Q4a=c(3,4,5),
                Q4b=c(6,7,8))

# tabulate means of columns starting with "Q4" (entered manually as arguments of the tab_cells function)
data %>% 
    tab_cells(Q4a,Q4b) %>%
    tab_cols(total()) %>% 
    tab_stat_fun(Mean = w_mean) %>%
    tab_pivot()

Darren Tsai · Accepted Answer · 2023-03-04T08:19:11.567

2

In the document of ?tab_cells, it said you can use mrset/mdset for multiple-response variables. You can search ?mrset for more details.

mrset_p and mdset_p: select variables for multiple-responses by perl-style regular expresssion.

data %>% 
    tab_cells(mrset_p("^Q4")) %>%    # similar to matches("^Q4") in <tidy-select>
    tab_cols(total()) %>% 
    tab_stat_fun(Mean = w_mean) %>%
    tab_pivot()

data %>% 
    tab_cells(mrset(Q4a %to% Q4b)) %>%    # similar to Q4a:Q4b in <tidy-select>
    tab_cols(total()) %>% 
    tab_stat_fun(Mean = w_mean) %>%
    tab_pivot()

Output

# |     |      | #Total |
# | --- | ---- | ------ |
# | Q4a | Mean |      4 |
# | Q4b | Mean |      7 |

edited Mar 04 '23 at 08:19

answered Mar 03 '23 at 06:24

Darren Tsai

32,117
5
21
51

2

For means it is better to use `..p`, e. g. `..p("^Q4")`. Result of the `mrset_*` will be considered as multiple response variable with the specific treatment in many cases.. – Gregory Demin Mar 03 '23 at 09:51
@gregorydemin Thank you. With means ``..p("^Q4")`` works great, but when I use ```..p("^Q4")``` and change the statistic from mean to column percent tab_stat_cpct(), no labels (Q4a and Q4b) are displayed in rows anymore. – statadvice Mar 06 '23 at 02:53
1

@statadvice Generally, `tab_stat_cpct` want variable labels. For using variable names you need to add `data %>% tab_prepend_names() %>% ... ` – Gregory Demin Mar 07 '23 at 09:05

How to select row/column variables starting with a particular set of characters (e.g., Q4) in R package expss?

1 Answers1

Output