0

I am currently grouping my variables in R: character variables manually, numeric (continuous) by equal percentage of population.

For equal % of population I use cut2(var, number_of_bins). I have continuous variables like var=TotalPaid/TotalDue which have special values as follows:

if TotalPaid AND TotalDue are 0 then var = 999 # Neither have paid nor have anything due
else if TotalPaid = 0 then var = 998 # Have Due but haven't paid anything
else if TotalDue = 0 then var = 997 # Have Paid but the due is 0

What I am aiming is to use cut2 and split equal groups that are NOT based on any special value (such as having all special values separately and then split the rest of the variable into groups) Example result var groups values (if I decide to split variable into 5% of population):

**Value**            **%pop**

0                x% of population
Range1           5% of population
Range2           5% of population
...              5% of population
999              y% of population
998              z% of population
997              p% of population

Note: Actually 0 is not a valid value because of the way special values are coded in the example above; I have included it just for the sake of the example)

Reproducible example:

###Data
x<-structure(list(PayCurrMonth_CurrMPV = c(1, 1, 1, 1.1111111111, 
999, 4.7619047619, 6.1407407407, 1, 1, 1, 1, 997, 1, 2.9666666667, 
1, 1.1666666667, 1, 998, 998, 1, 1, 1, 1, 1, 1.0256410256, 998, 
3.3333333333, 6.5, 5, 1, 1, 5363.6363636, 998, 1.0416666667, 
1, 1, 998, 999, 329.34508816, 1, 4, 998, 1, 1, 1, 998, 999, 2.5, 
999, 1, 998, 1, 1, 1, 1, 1.1111111111, 1, 997, 997, 2, 1, 1, 
1, 6, 999, 1, 1.037037037, 3.962962963, 1, 1, 1, 999, 7.9333333333, 
1.2820512821, 1, 1.3333333333, 1, 7.3620273532, 1, 1, 1, 1.5833333333, 
998, 2.8333333333, 1.1111111111, 10.21751051, 998, 2, 1, 997, 
1, 1, 1, 1, 5.3333333333, 2.5166666667, 1, 1, 1.0833333333, 1, 
1, 7.0024444444, 1, 0.8333333333, 999, 1.3333333333, 1, 1, 1, 
629.7, 0.4, 1, 1, 1, 998, 1, 998, 1, 3.001322314, 1, 1, 1, 1, 
1, 997, 0.825, 1, 1, 999, 1, 1, 338.15789474, 998, 1, 1, 1, 1, 
1.0833333333, 1, 1.1111111111, 1, 1.7047619048, 0.8333333333, 
998, 1, 1, 1, 999, 1, 4.5071666667, 1.1111111111, 1, 998, 1, 
1, 1, 1, 0.2941666667, 3, 2.6666666667, 3.5816618911, 1, 998, 
1, 1, 1, 1, 997, 1, 1, 1, 1, 1.06, 997, 1, 2, 1.3333333333, 3.2222222222, 
4.7555555556, 999, 1, 1, 1, 1, 1, 1, 1, 1, 999, 1, 3.3333333333, 
1, 1.6666666667, 1, 1, 1, 1, 1, 1.3888888889, 1, 4.5714285714, 
2.0952380952, 1, 1, 999, 1, 998, 1.1111111111, 1, 1, 1, 999, 
1, 8.8933333333, 1.0666666667, 1, 1.0666666667, 998, 1, 1, 2.5, 
1, 115.77998197, 997, 1, 997, 1, 2, 7.5555555556, 2.6666666667, 
1.1666666667, 1, 999, 2.4, 1.6666666667, 2.1111111111, 2.1111111111, 
998, 2, 998, 1.0833333333, 1, 1, 1, 50, 1.0533333333, 1, 2, 1, 
0.303030303, 1, 1.1111111111, 6.7066666667, 998, 1, 6.6666666667, 
2, 1)), .Names = "PayCurrMonth_CurrMPV", row.names = c(NA, -258L
), class = "data.frame")

    ###split data into special and non special values
    x1<-subset(x,PayCurrMonth_CurrMPV %in% c(997,998,999,1))
    x2<-subset(x,!PayCurrMonth_CurrMPV %in% c(997,998,999,1))

    ###apply equal % of pop only to non special values
    x2$PayCurrMonth_CurrMPV<-cut2(x2$PayCurrMonth_CurrMPV, m = floor( ( 5 / 100 ) * nrow( x2 ) ) )

###combine back special and non special values to form-back the variable - now grouped
x_all<-rbind(x1,x2)

this is what I got so far

z<-x[,1] %in% c(997,998,999,1)
f<-cut2(x$PayCurrMonth_CurrMPV[!z], m = floor( ( 5 / 100 ) * nrow( x )  ))
x$PayCurrMonth_CurrMPV[!z]<-as.character(f)

Anyone having smart ideas how to do this easy?

Thanks in advance

Bullzeye
  • 153
  • 1
  • 11
  • Interesting question, but a reproducible example (input) is missing which makes it difficult to help. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example. Initial thought: `split` the data on a logical index with something like `split(df, df$Value %in% 997:999)` and then run the `cut` function on one element of the resulting list.. – talat May 20 '15 at 08:06
  • Thanks @docendodiscimus . I quickly added reproducible example I hope will make things clearer. In the example I am also separating 1 since these are the customers that have paid exactly the amount due. BR – Bullzeye May 20 '15 at 08:42
  • cut2(var,g=20) is shorter version of what we would like to achieve with the function – Bullzeye May 20 '15 at 11:05

0 Answers0