I am trying to sample_n
by age group (Bage), gender, and employment to create a new column with ethnicity. I've found a way to do it but for each sample there is 9 lines of code, and the size changes each time as I am distributing different numbers of people depending on their ethnic group.
The example below shows the code for randomly distributing unemployed males in the 16-24 age group with an ethnic group defined by the census as 'Other'. The example data is taken from the full dataset. Following from this I would then repeat all lines of code (changing the specifics; bage,gender,employment,size) for all employment types and ethnicities, hence it is a long, slow process. I've looked at creating loops or functions but I'm not really getting anywhere as keep getting stuck because different size samples I need, rather than the same sample size through the whole dataset.
Any advice on reducing the length of code and time to do this would be greatly appreciated.
Sample input data: showing age group 16-24 (Bage==16), and Males for some employment types:
ID Ages Bage Gender Employment Ethnicity
77 16 16 16 Male PT
78 78 16 16 Male PT
79 79 16 16 Male PT
80 80 16 16 Male PT
81 81 16 16 Male PT
82 82 16 16 Male PT
83 83 16 16 Male PT
91 91 16 16 Male PT
92 92 16 16 Male PT
93 93 16 16 Male PT
94 94 16 16 Male PT
95 95 16 16 Male PT
96 96 16 16 Male PT
97 97 16 16 Male PT
98 98 16 16 Male PT
99 99 16 16 Male PT
100 100 16 16 Male PT
101 101 16 16 Male PT
102 102 16 16 Male PT
127 127 16 16 Male FT
128 128 16 16 Male FT
129 129 16 16 Male FT
130 130 16 16 Male FT
131 131 16 16 Male FT
132 132 16 16 Male FT
133 133 16 16 Male FT
134 134 16 16 Male FT
135 135 16 16 Male FT
136 136 16 16 Male SEFT
137 137 16 16 Male UN
138 138 16 16 Male UN
139 139 16 16 Male UN
140 140 16 16 Male UN
141 141 16 16 Male UN
142 142 16 16 Male UN
143 143 16 16 Male UN
... ... .. .. ... ..
Current code:
UNOTH=sample_n(EdUNAS[EdUNAS$Bage=="16" & EdUNAS$Gender=="Male" & EdUNAS$Employment=="UN" & EdUNAS$Ethnic=="0",],size=1, replace=FALSE)
UNOTH["Ethnic"]="Other"
Edunoth=merge(EdUNAS, UNOTH, by = "ID", all = TRUE)
Edunoth$Bage.x.x.y=NULL
Edunoth$Ages.x.x.y=NULL
Edunoth$Gender.x.x.y=NULL
Edunoth$Employment.x.x.y=NULL
Edunoth[is.na(Edunoth)] = ''
EdUNOTH=unite(Edunoth, Ethnic, Ethnic.x:Ethnic.y, sep='')
Wanted output: The Ethnicity column filled in based proportions I know from the census data.
ID Ages Bage Gender Employment Ethnicity
77 16 16 16 Male PT White
78 78 16 16 Male PT White
79 79 16 16 Male PT White
80 80 16 16 Male PT White
81 81 16 16 Male PT White
82 82 16 16 Male PT White
83 83 16 16 Male PT Asian
91 91 16 16 Male PT White
92 92 16 16 Male PT White
93 93 16 16 Male PT Other
94 94 16 16 Male PT White
95 95 16 16 Male PT White
96 96 16 16 Male PT White
97 97 16 16 Male PT White
98 98 16 16 Male PT Asian
99 99 16 16 Male PT White
100 100 16 16 Male PT White
101 101 16 16 Male PT White
102 102 16 16 Male PT White
127 127 16 16 Male FT White
128 128 16 16 Male FT White
129 129 16 16 Male FT White
130 130 16 16 Male FT White
131 131 16 16 Male FT White
132 132 16 16 Male FT White
133 133 16 16 Male FT White
134 134 16 16 Male FT White
135 135 16 16 Male FT White
136 136 16 16 Male SEFT White
137 137 16 16 Male UN White
138 138 16 16 Male UN White
139 139 16 16 Male UN White
140 140 16 16 Male UN White
141 141 16 16 Male UN Asian
142 142 16 16 Male UN White
143 143 16 16 Male UN White
... ... .. .. ... .. ...