I have a 2319 row data frame df
; I would like to sort the continuous variable var
and split in into a specified number of groups with an equal (or as close as possible) number of observations per group. I have seen a similar post where cut2()
from Hmisc
was recommended, but it does not always provide an equal number of observations per group. For example, what I have using cut2()
df$Group <- as.numeric(cut2(df$var, g = 10))
var Group
1415 1
1004 1
1285 1
2099 2
2119 2
2427 4
...
table(df$Group)
1 2 3 4 5 6 7 8 9 10
232 232 241 223 233 246 219 243 226 224
Has anyone used/written something that does not rely on the underlying distribution of the variable (e.g. var
), but rather the number of observations in the data and number of groups specified? I do have non-unique values.
What I want is a more equal number of observations, for example:
table(df$Group)
1 2 3 4 5 6 7 8 9 10
232 232 231 233 231 233 232 231 231 233