I have done some manipulations as below to arrive at the following dataframe:
df
cluster.kmeans variable max mean median min sd
1 1 MonthlySMS 191 90.32258 71.0 8 56.83801
2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
3 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
4 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
5 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
6 2 MonthlySMS 155 53.18815 57.0 1 31.64533
7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
8 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
9 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
10 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
11 3 MonthlySMS 215 135.60202 137.0 49 34.09794
12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
13 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
14 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
15 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
16 4 MonthlySMS 136 49.93980 54.5 1 31.47778
17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
18 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157
19 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
20 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
I would like to order the variable
column based on these strings:
top.vars_kmeans
[1] "ThreeMonthlySMS" "SixMonthlyData" "ThreeMonthlyData"
[4] "MonthlySMS" "SixMonthlySMS"
I could do it using sqldf as below:
library(sqldf)
a <- c(1,2,3,4,5)
a <- data.frame(top.vars_kmeans,a)
a <- sqldf('select a1.* ,b1.a from "MS.DATA.STATS.KMEANS" a1 inner join a b1
on a1.variable=b1."top.vars_kmeans"')
a <- sqldf('select * from a order by "cluster.kmeans",a')
a$a <- NULL
a
cluster.kmeans variable max mean median min sd
1 1 ThreeMonthlySMS 199 88.35484 76.0 6 59.15491
2 1 SixMonthlyData 1085 567.09677 573.0 109 275.46994
3 1 ThreeMonthlyData 1038 563.03226 573.0 94 275.51340
4 1 MonthlySMS 191 90.32258 71.0 8 56.83801
5 1 SixMonthlySMS 208 94.38710 86.0 29 56.27828
6 2 ThreeMonthlySMS 149 53.68641 50.5 3 31.40001
7 2 SixMonthlyData 574 280.27352 280.5 -48 139.75252
8 2 ThreeMonthlyData 548 280.89547 279.0 -11 137.54755
9 2 MonthlySMS 155 53.18815 57.0 1 31.64533
10 2 SixMonthlySMS 167 57.77526 47.0 1 33.49210
11 3 ThreeMonthlySMS 240 146.00756 146.0 54 37.06427
12 3 SixMonthlyData 1046 541.76322 557.0 2 258.90622
13 3 ThreeMonthlyData 1064 541.50378 558.0 10 255.35560
14 3 MonthlySMS 215 135.60202 137.0 49 34.09794
15 3 SixMonthlySMS 314 152.40302 152.0 27 45.55642
16 4 ThreeMonthlySMS 141 50.88796 46.0 1 31.07977
17 4 SixMonthlyData 1091 788.09365 805.0 503 145.67031
18 4 ThreeMonthlyData 1073 785.19398 799.5 500 142.90054
19 4 MonthlySMS 136 49.93980 54.5 1 31.47778
20 4 SixMonthlySMS 190 57.50167 46.0 1 33.66157
I am just curious to know if this could be achieved using dplyr......my understanding of this wonderful package will get enhanced....
need help here!