svymean does not consider recoded variables

Question

I am very new to R and would like to excuse wrong usage of specific terms and the German words. Hope that my issue is understandable anyway. For my data project I have recoded variables to obtain binary variables instead of character ones.

genZ_prep %>%
mutate(life_satisf = factor(case_when(
 life_satisf %in% c("Sehr zufrieden", "Zufrieden") ~ 1,
 life_satisf %in% c("Weniger zufrieden",
               "Gar nicht zufrieden") ~ 0),
 levels = c(1, 0), 
 labels = c("satisfied", "unsatisfied"))) %>%
mutate(mat_satisf = factor(case_when(
 mat_satisf %in% c("Selten", "Nie") ~ 1,
 mat_satisf %in% c("Häufig",
               "Gelegentlich") ~ 0),
 levels = c(1, 0), 
 labels = c("yes", "no")))

I have then created a svydesign-object

genZ_prep_str <-
  svydesign(data = genZ_prep,
            id = ~ 1,
            strata = ~ state)

Now, I wanted to estimate the svymean but instead of mean values for the new items (satisfied, unsatisfied), it displays mean values for the original responses (Sehr zufrieden, zufrieden, ...)

svymean( ~ mat_satisf, design = genZ_prep_str, na.rm = TRUE)

Do I have to add extra codes or did I make a mistake? Also, the variables life_satisf and mat_satisf contain missing values but I am at this point not asked to specifically deal with them. Would na.rm = TRUE be the correct way to handle them for obtaining svymean/svytotal?

It doesn't look like you saved the results of all the `mutate()` calls. Those commands don't update the original data.frame. They return a new data object that you need to save if you want to use later. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Feb 23 '22 at 22:47

score 1 · Answer 1 · answered Feb 24 '22 at 13:12

I followed @MrFlicks hint and saved the mutated variables. But I saved them in the genZ_prep dataset instead of the original genZ. Would that also be reasonable?

genZ_prep <- 
  genZ_prep %>%
  mutate(life_satisf = factor(case_when(
    life_satisf %in% c("Sehr zufrieden", "Zufrieden") ~ 1,
    life_satisf %in% c("Weniger zufrieden",
                  "Gar nicht zufrieden") ~ 0),
    levels = c(1, 0), 
    labels = c("satisfied", "unsatisfied"))) %>%
  mutate(mat_satisf = factor(case_when(
    mat_satisf %in% c("Selten", "Nie") ~ 1,
    mat_satisf %in% c("Häufig",
                  "Gelegentlich") ~ 0),
    levels = c(1, 0), 
    labels = c("yes", "no")))

I created an arbitrary data set with what you demonstrated in your data.

Wow, thanks for your effort! I have tried to create such arbitrary data set but I became quite frustrated when I made mistakes. Deadline is coming closer, so my nerves are a bit fragile...

Were you aware that yes is set to rarely or never?

mat_satisf is supposed to say whether people usually fulfill their material desires. So people who are rarely or never confronted with unfulfilled wishes due to financial boundaries are satisfied ("yes"). Its a bit tricky but should work I guess.

Then I ran the svydesign and svymean. However, I only have these three variables, so there isn't going to be very meaningful information from these calls.

Actually, there is a third variable, which I didn't recode as it is already binary. The states are only supposed to be my strata, so that I can create a stratified survey design-object (the survey itself is quota sampling). I am not sure yet, whether it makes sense to include it at this point but didn't how else to include it.

score 0 · Answer 2 · answered Feb 24 '22 at 05:24

These functions appear to all be working with what I can see in your data and what you've used. You do need to take into account the advice that @MrFlick provided. For nearly all functions in R, you have to use an assignment operator to actually change the object.

Whenever you make changes, you should always validate that the change was as you had expected it to be, as well. That's important!

I created an arbitrary data set with what you demonstrated in your data.

Make your questions reproducible in the future...it can be fake data or just a small snapshot; check it out: making R reproducible questions.

library(tidyverse)
library(survey)

set.seed(357)
genZ_prep <- data.frame(
  life_satisf = sample(rep(c("Sehr zufrieden", "Zufrieden",
                             "Weniger zufrieden", "Gar nicht zufrieden"), 100), 300),
  mat_satisf = sample(rep(c("Selten", "Nie", "Häufig", 
                            "Gelegentlich"), each = 100), 300),
  state = sample(rep(c("Baden-Württemberg","Bavaria","Berlin","Brandenburg",
                       "Bremen","Hamburg","Hesse","Lower Saxony"), 50), 300)
)

Then I used what you had coded to modify the data. However, I used an assignment operator to save the changes and I put both case_when calls in one mutate call. (You can have two; I just combined it when I was coding is all.)

genZ <- genZ_prep %>%     # recoding levels of satisfaction (very to not at all)
  mutate(life_satisf = factor(case_when(
    life_satisf %in% c("Sehr zufrieden", "Zufrieden") ~ 1,
    life_satisf %in% c("Weniger zufrieden","Gar nicht zufrieden") ~ 0),
    levels = c(1, 0), 
    labels = c("satisfied", "unsatisfied")),
    mat_satisf = factor(case_when(    # recode levels of satis (never to often)
      mat_satisf %in% c("Selten", "Nie") ~ 1,
      mat_satisf %in% c("Häufig", "Gelegentlich") ~ 0),
      levels = c(1, 0), 
      labels = c("yes", "no")))  # yes is rare or never; no is often or occasionally

Were you aware that yes is set to rarely or never? I would have thought that you would have set yes to occasionally or often...it's not my data! Do it your way. I just thought I would mention it, in case you had not realized that.

The next part's important--validate the changes.

# validate changes
funModeling::df_status(genZ)
#      variable q_zeros p_zeros q_na p_na q_inf p_inf      type unique
# 1 life_satisf       0       0    0    0     0     0    factor      2
# 2  mat_satisf       0       0    0    0     0     0    factor      2
# 3       state       0       0    0    0     0     0 character      8 


levels(genZ$life_satisf)
# [1] "satisfied"   "unsatisfied" 

levels(genZ$mat_satisf)
# [1] "yes" "no"

Then I ran the svydesign and svymean. However, I only have these three variables, so there isn't going to be very meaningful information from these calls.

genSavy <- svydesign(data = genZ, 
                     id = ~ 1,
                     strata = ~ state)
summary(genSavy)
# Stratified Independent Sampling design (with replacement)
# svydesign(data = genZ_prep, id = ~1, strata = ~state)
# Probabilities:
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#       1       1       1       1       1       1 
# Stratum Sizes: 
#            Baden-Württemberg Bavaria Berlin Brandenburg Bremen Hamburg Hesse
# obs                       35      39     34          40     34      39    38
# design.PSU                35      39     34          40     34      39    38
# actual.PSU                35      39     34          40     34      39    38
#            Lower Saxony
# obs                  41
# design.PSU           41
# actual.PSU           41
# Data variables:
# [1] "life_satisf" "mat_satisf"  "state"      

genSm <- svymean(~mat_satisf, design = genSavy, na.rm = T)
genSm
#                  mean    SE
# mat_satisfyes 0.49667 0.029
# mat_satisfno  0.50333 0.029

More satisfied than unsatisfied with whatever mat_satisf is in this random set I created.

svymean does not consider recoded variables

2 Answers2