0

I am working with survey data in Python. There is a weighting variable based on age, gender and region which should be included in the calculations (to make the data representative of the population).

The weighting variable is a simple decimal number, most often between >= 0.9 and <= 1.2.

I don't know how to include it in simple calculations. Most of the variables have "Yes/no/not sure"-values or other categories.

For example, how can I include the weighting variable here:

survey['my_variable'].value_counts(normalize=True)
martineau
  • 119,623
  • 25
  • 170
  • 301
Florian Seliger
  • 421
  • 4
  • 16
  • Perhaps this would help? https://stackoverflow.com/questions/53980468/what-is-the-recommended-way-to-compute-a-weighted-sum-of-selected-columns-of-a-p/53980559 – marsnebulasoup Aug 18 '20 at 15:52
  • I am not sure. My variables include categories and I don't want to convert them to numbers. The weighting variable is a column in my data frame. – Florian Seliger Aug 19 '20 at 06:28

1 Answers1

1

I think I have found a solution based on this: Groupby with weight

So my strategy is to first aggregate the data frame by survey week, country and the categorical variable I am interested in:

survey_c.groupby(['week','country','my_cat_var']).weight.sum().reset_index(name='count')

Afterwards, I can use the the aggregated data for plotting or whatever.

If anyone has a comment or a better strategy, please raise your hand

Florian Seliger
  • 421
  • 4
  • 16