0

I have a large dataset from a survey. I already have a column/variable that is a weight that should be applied to the whole data set. This is simply achieved by in SPSS, but I would like to do this in R as well. What I already know is how to apply the weighting variable to an individual column like this:

gend <- wtd.table(master.data$Q10_GENDER, weights = master.data$Weight_Age_Gender_Income)

This works great for the one variable, but I would hate to have to do that for every single command I run. Is there a way to apply to the entire data set?

I reduced a larger subset of my external database into a simple three variable DF, and I would like to apply one pre-calculated weight variable to the entire DF.

test <- data.frame(br$Q10_GENDER, 
               br$Q5B_URBANICITY, 
               br$S4_AGE_GROUP_1)

br$Weight_Age_Gender_Income

Summary output:

     br.Q10_GENDER                br.Q5B_URBANICITY br.S4_AGE_GROUP_1
 Female:4986   Urban, city center      :8791    18-24  :3048     
 Male  :4893   Suburbs surrounding city: 827    25-29  :1664     
 Other :  44   Rural                   : 305    30-34  :1218     
                                                35-39  : 954     
                                                40-44  : 806     
                                                13-17  : 763     
                                                (Other):1470 
cam417
  • 13
  • 1
  • 6
  • It would be helpful if you could include a minimal representation of your sample dataset. Have you tried `dplyr::mutate_at`? – Peter Apr 14 '20 at 19:42
  • Peter - the data set has 100's of columns (questions in the survey) and over 10k rows for each respondent. Does that help? There are multiple different types of data in each column (but the majority are categorical text variables like gender) – cam417 Apr 14 '20 at 21:20
  • That's the point of abstracting your huge dataframe into a simple, small as possible dataframe which only includes information that addresses your question. I find this link helpful: [mre]. Could you provide a dataframe with 3 or 4 columns (or as many unique column data types as you have) and no more than three rows. This might succinctly summarise the issue you are trying to solve and enable others to help. – Peter Apr 15 '20 at 00:06
  • So you think that it would be most hopeful to identify the variables/columns in my original dataset that I am most interested in investigating and creating a new DF with those and applying the weight to just that dataframe? If so, what command would be best for that? – cam417 Apr 15 '20 at 17:55
  • I'm asking for a simplified dataset in order to help with answering your question. Without seeing a minimum set of data any help is just guesswork; does the proposed answer below help an any way. This link may help: – Peter Apr 15 '20 at 18:55
  • @Peter thank you! I tried to reproduce a minimal example DF above in my question. Does that help? – cam417 Apr 17 '20 at 17:46
  • Thank you. You are getting there. You do need to add some data! You can use a built in dataset from R, create one such as below in the answer or extract a partial data set from your own data. You can use `dput()`. – Peter Apr 17 '20 at 17:54
  • have you tried to see if the proposed answer is of any help? – Peter Apr 17 '20 at 17:56
  • How about that addition up top with summary output? When I try the below it says "* Not meaningful for factors" @Peter – cam417 Apr 17 '20 at 18:02
  • Sorry Cam417. A data frame means a data frame. Have you read through the posts on creating a minimal reproducable example [mre]. It is really important to do this not only does it enable others to help you but it helps you understand the data and problem better. – Peter Apr 17 '20 at 18:06
  • I'm sorry I guess I don't understand what's being asked for. I thought I created a dataframe and included it above in my example? – cam417 Apr 17 '20 at 18:09
  • You have created a dataframe but no-one can read it on SO as the dataframe `br` you are creating the data from is on your computer not in the post. Take time to read [mre]; Try copying the example below into your console to get a better understanding of what you should do to help us help you. It all does take time, but that's the way it is. – Peter Apr 17 '20 at 18:19

2 Answers2

0

This might be a long shot it is not clear what your data looks like.

library(dplyr)

set.seed(123)

df <- data.frame(v1 = runif(4),
              v2 = c(1, 2, 2, 1),
              v3 = 1:4,
              wgt = c(0.1, 0.5, 1, 2))


df %>% mutate_at(vars(v1:v3), function(x) .$wgt * x)

Peter
  • 11,500
  • 5
  • 21
  • 31
0

You can use the survey package

library(survey)

my_design <-svydesign(id=~1, weights=~Weight_Age_Gender_Income, data=master.data.table)

svytable(~gender, design=my_design, na.rm=TRUE)
svytable(~urbanicity, design=my_design, na.rm=TRUE)

## two-way table
svytable(~+gender+urbanicity, design=my_design, na.rm=TRUE)

If you want accurate standard errors, you will also need to supply cluster and strata information to svydesign(). SPSS doesn't require this, but it also doesn't give accurate standard errors.

The survey package also has a range of regression models and graphics for weighted survey data

Thomas Lumley
  • 1,893
  • 5
  • 8