I am currently grouping my variables in R: character variables manually, numeric (continuous) by equal percentage of population.
For equal % of population I use cut2(var, number_of_bins)
.
I have continuous variables like var=TotalPaid/TotalDue
which have special values as follows:
if TotalPaid AND TotalDue are 0 then var = 999 # Neither have paid nor have anything due
else if TotalPaid = 0 then var = 998 # Have Due but haven't paid anything
else if TotalDue = 0 then var = 997 # Have Paid but the due is 0
What I am aiming is to use cut2 and split equal groups that are NOT based on any special value (such as having all special values separately and then split the rest of the variable into groups)
Example result var
groups values (if I decide to split variable into 5% of population):
**Value** **%pop**
0 x% of population
Range1 5% of population
Range2 5% of population
... 5% of population
999 y% of population
998 z% of population
997 p% of population
Note: Actually 0 is not a valid value because of the way special values are coded in the example above; I have included it just for the sake of the example)
Reproducible example:
###Data
x<-structure(list(PayCurrMonth_CurrMPV = c(1, 1, 1, 1.1111111111,
999, 4.7619047619, 6.1407407407, 1, 1, 1, 1, 997, 1, 2.9666666667,
1, 1.1666666667, 1, 998, 998, 1, 1, 1, 1, 1, 1.0256410256, 998,
3.3333333333, 6.5, 5, 1, 1, 5363.6363636, 998, 1.0416666667,
1, 1, 998, 999, 329.34508816, 1, 4, 998, 1, 1, 1, 998, 999, 2.5,
999, 1, 998, 1, 1, 1, 1, 1.1111111111, 1, 997, 997, 2, 1, 1,
1, 6, 999, 1, 1.037037037, 3.962962963, 1, 1, 1, 999, 7.9333333333,
1.2820512821, 1, 1.3333333333, 1, 7.3620273532, 1, 1, 1, 1.5833333333,
998, 2.8333333333, 1.1111111111, 10.21751051, 998, 2, 1, 997,
1, 1, 1, 1, 5.3333333333, 2.5166666667, 1, 1, 1.0833333333, 1,
1, 7.0024444444, 1, 0.8333333333, 999, 1.3333333333, 1, 1, 1,
629.7, 0.4, 1, 1, 1, 998, 1, 998, 1, 3.001322314, 1, 1, 1, 1,
1, 997, 0.825, 1, 1, 999, 1, 1, 338.15789474, 998, 1, 1, 1, 1,
1.0833333333, 1, 1.1111111111, 1, 1.7047619048, 0.8333333333,
998, 1, 1, 1, 999, 1, 4.5071666667, 1.1111111111, 1, 998, 1,
1, 1, 1, 0.2941666667, 3, 2.6666666667, 3.5816618911, 1, 998,
1, 1, 1, 1, 997, 1, 1, 1, 1, 1.06, 997, 1, 2, 1.3333333333, 3.2222222222,
4.7555555556, 999, 1, 1, 1, 1, 1, 1, 1, 1, 999, 1, 3.3333333333,
1, 1.6666666667, 1, 1, 1, 1, 1, 1.3888888889, 1, 4.5714285714,
2.0952380952, 1, 1, 999, 1, 998, 1.1111111111, 1, 1, 1, 999,
1, 8.8933333333, 1.0666666667, 1, 1.0666666667, 998, 1, 1, 2.5,
1, 115.77998197, 997, 1, 997, 1, 2, 7.5555555556, 2.6666666667,
1.1666666667, 1, 999, 2.4, 1.6666666667, 2.1111111111, 2.1111111111,
998, 2, 998, 1.0833333333, 1, 1, 1, 50, 1.0533333333, 1, 2, 1,
0.303030303, 1, 1.1111111111, 6.7066666667, 998, 1, 6.6666666667,
2, 1)), .Names = "PayCurrMonth_CurrMPV", row.names = c(NA, -258L
), class = "data.frame")
###split data into special and non special values
x1<-subset(x,PayCurrMonth_CurrMPV %in% c(997,998,999,1))
x2<-subset(x,!PayCurrMonth_CurrMPV %in% c(997,998,999,1))
###apply equal % of pop only to non special values
x2$PayCurrMonth_CurrMPV<-cut2(x2$PayCurrMonth_CurrMPV, m = floor( ( 5 / 100 ) * nrow( x2 ) ) )
###combine back special and non special values to form-back the variable - now grouped
x_all<-rbind(x1,x2)
this is what I got so far
z<-x[,1] %in% c(997,998,999,1)
f<-cut2(x$PayCurrMonth_CurrMPV[!z], m = floor( ( 5 / 100 ) * nrow( x ) ))
x$PayCurrMonth_CurrMPV[!z]<-as.character(f)
Anyone having smart ideas how to do this easy?
Thanks in advance