0

I want to apply weights to the average. I would expect the weights having some form of effect. However, the weights (pspwght*pweight) have zero effect on the mean (compared to when not using weights).

# Calculate mean total 
total_mean <- ESS_subset_au %>%
  select(cntry, stfdem, wl) %>%
  group_by(cntry) %>%
  na.omit() %>%
  summarize(avg = weighted.mean(stfdem, na.rm = T, weights = weight*pspwght))

This is the data that I am using:

ESS_subset_au = structure(list(idno = c(10105L, 10107L, 10109L, 10201L, 10202L, 
10208L, 10209L, 10302L, 10305L, 10306L, 10307L, 10308L, 10309L, 
10401L, 10405L), cntry = c("BE", "BE", "BE", "BE", "BE", "BE", 
"BE", "BE", "BE", "BE", "BE", "BE", "BE", "BE", "BE"), stfdem = c(5L, 
1L, 6L, 9L, 2L, 7L, 9L, 10L, 7L, 6L, 6L, 2L, 5L, 7L, 8L), polintr = c(3L, 
3L, 3L, 3L, 3L, 2L, 2L, 4L, 3L, 3L, 3L, 1L, 2L, 3L, 4L), dweight = c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), pspwght = c(1.28970534688181, 
0.853106648103221, 1.28441114836477, 1.29301264667416, 0.862898819342774, 
0.853106648103221, 0.855294341165787, 0.862898819342774, 1.28970534688181, 
0.858825994323025, 0.855294341165787, 0.853112832617122, 1.28441114836477, 
0.862898819342774, 0.859350417967113), pweight = c(0.492718566, 
0.492718566, 0.492718566, 0.492718566, 0.492718566, 0.492718566, 
0.492718566, 0.492718566, 0.492718566, 0.492718566, 0.492718566, 
0.492718566, 0.492718566, 0.492718566, 0.492718566), gnd.rc = c(0, 
0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1), age.rc = c(1, 3, 7, 
2, 2, 3, 4, 3, 1, 3, 4, 5, 7, 3, 4), job.rc = c(0, 1, 0, 1, 1, 
1, 1, 0, 0, 0, 1, 1, 0, 0, 0), inc.rc = c(NA, 2, 1, 1, 2, 2, 
NA, 2, 2, 2, 2, 3, 1, 1, 1), pid.rc = c(0, 0, 0, 0, 1, 1, 0, 
0, 1, 0, 1, 1, 1, 0, 0), edu.rc = c(2, 4, 2, 2, 2, 4, 3, 2, 2, 
4, 4, 3, 2, 2, 2), trstinst = c(5.5, 3, 1.7, 6.8, 7.2, 6.3, 7.8, 
8.3, 5.8, 6, 4.5, 4, 5, 4.8, 8), wl = c(NA, NA, NA, NA, 0, 0, 
1, NA, NA, 0, 1, 0, 1, 0, NA), Inflation = c(1.11309594, 1.11309594, 
1.11309594, 1.11309594, 1.11309594, 1.11309594, 1.11309594, 1.11309594, 
1.11309594, 1.11309594, 1.11309594, 1.11309594, 1.11309594, 1.11309594, 
1.11309594), GDPg = c(0.459242193, 0.459242193, 0.459242193, 
0.459242193, 0.459242193, 0.459242193, 0.459242193, 0.459242193, 
0.459242193, 0.459242193, 0.459242193, 0.459242193, 0.459242193, 
0.459242193, 0.459242193), GDPpc = c(44355.37731, 44355.37731, 
44355.37731, 44355.37731, 44355.37731, 44355.37731, 44355.37731, 
44355.37731, 44355.37731, 44355.37731, 44355.37731, 44355.37731, 
44355.37731, 44355.37731, 44355.37731), Enep = c(10.04, 10.04, 
10.04, 10.04, 10.04, 10.04, 10.04, 10.04, 10.04, 10.04, 10.04, 
10.04, 10.04, 10.04, 10.04), PR = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), row.names = c(NA, 15L), class = "data.frame")

What do I miss?

Thank you!

  • What language is this? What is the value of dweight? There's a lot of missing info here. – Kwright02 Jun 06 '21 at 17:11
  • 1
    @Kwright02: the [tag:r] language. – r2evans Jun 06 '21 at 17:11
  • 1
    Marta, it may indicate your weights are ineffective (nearly equal?). Lacking any representative data (and therefore not-reproducible), this question is not answerable in the empirical sense. – r2evans Jun 06 '21 at 17:12
  • 1
    Further (perhaps this was Kwright02's point), it's always better to be explicit about non-base packages you're using. In this case, it is likely easy to infer `dplyr`. Please [edit] your question to include this, and add in sample data (please use `dput(.)`), and expected results (that are different than what is calculated). Please see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. Thanks! – r2evans Jun 06 '21 at 17:14
  • I see, thank you. I have updated my question. –  Jun 06 '21 at 17:26

1 Answers1

0

As already commented you should provide at least a sample of your dataset. The R dput function is a perfect tool to share it with us.

Btw. My first idea (without the data) is that the weights are the same for all values.

x <- 1:10

mean(x)
#> [1] 5.5

anything <- 0.3
weighted.mean(x, rep(anything, length(x)))
#> [1] 5.5

Created on 2021-06-06 by the reprex package (v2.0.0)

Update after PO provided the data:

total_mean <- ESS_subset_au %>%
  select(cntry, stfdem, wl, pspwght) %>%
  group_by(cntry) %>%
  na.omit() %>%
  summarize(wavg = weighted.mean(stfdem, w = pspwght, na.rm = TRUE),
            avg = mean(stfdem))

Remember to add all needed variables in select (like weights, here pspwght), stats::weighted.mean do not have weights argument, it has a w one. Last thing, your dataset does not contain weight variable.

polkas
  • 3,797
  • 1
  • 12
  • 25
  • I have added sample data which shows that the weights are different for different observations. –  Jun 06 '21 at 17:25
  • I made an update after you provide the data. – polkas Jun 06 '21 at 17:36
  • How would I add the weights in this case though? ```winner_mean <- ESS_subset_au %>% filter(wl == 1) %>% group_by(cntry) %>% summarise(avg_winner = weighted.mean(stfdem, na.rm = T, w = (pspwght*pweight*10e2)))``` –  Jun 06 '21 at 17:40
  • In the new case, `pspwght`*`pweight` are your weights. You could remove 10e2 as multiplying by constant is neutral here. What is your question precisely. – polkas Jun 06 '21 at 17:44