Include survey weighting variable

Question

I am working with survey data in Python. There is a weighting variable based on age, gender and region which should be included in the calculations (to make the data representative of the population).

The weighting variable is a simple decimal number, most often between >= 0.9 and <= 1.2.

I don't know how to include it in simple calculations. Most of the variables have "Yes/no/not sure"-values or other categories.

For example, how can I include the weighting variable here:

survey['my_variable'].value_counts(normalize=True)

Perhaps this would help? https://stackoverflow.com/questions/53980468/what-is-the-recommended-way-to-compute-a-weighted-sum-of-selected-columns-of-a-p/53980559 — marsnebulasoup, Aug 18 '20 at 15:52
I am not sure. My variables include categories and I don't want to convert them to numbers. The weighting variable is a column in my data frame. — Florian Seliger, Aug 19 '20 at 06:28

score 1 · Answer 1 · answered Aug 19 '20 at 08:27

I think I have found a solution based on this: Groupby with weight

So my strategy is to first aggregate the data frame by survey week, country and the categorical variable I am interested in:

survey_c.groupby(['week','country','my_cat_var']).weight.sum().reset_index(name='count')

Afterwards, I can use the the aggregated data for plotting or whatever.

If anyone has a comment or a better strategy, please raise your hand

Include survey weighting variable

1 Answers1