R: imputation of values in a data frame column by distribution of that variable

Question

I have searched stackoverflow and google regarding this but not yet found a fitting answer.

I have a data frame column with ages of individuals. Out of around 10000 observations, 150 are NAs. I do not want to impute those with the mean age of the whole column but assign random ages based on the distribution of the ages in my data set i.e. in this column.

How do I do that? I tried fiddling around with the MICE package but didn't make much progress.

Do you have a solution for me?

Thank you, corkinabottle

Welcome to Stack Overflow. The brilliant minds here need some details and an example to answer your question effectively. Look here for how: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — Minnow, Dec 29 '20 at 15:13

SteveM · Accepted Answer · 2020-12-29T16:44:15.873

1

You could simply sample 150 values from your observations:

samplevals <- sample(obs, 150)

You could also stratify your observations across quantiles to increase the chances of sampling your tail values by sampling within each quantile range.

edited Dec 29 '20 at 16:44

answered Dec 29 '20 at 16:13

SteveM

2,226
3
12
16

R: imputation of values in a data frame column by distribution of that variable

1 Answers1