0

i made a clustering based on the k-medoids algorithm, my problem is the graphic, i got this as a result, the two components as the axis:

    library(fpc) 
    rez<-pamk(tab$Presence) (the values of the presence column are the    valures that i want to assign to groups or classes, tab is my data which i described it) 
    plot(rez$pamobject)>

what i want is that the clusters be showen but with the a datetime column on the x axis (first one which is Dat_Heure) and the factors that compose the clusters as the y axis (the 13 column which is Prsence)

a subset from my data ::

Dat_Heure    Devtype Devidx Capt_radio Fonction Fonction_nom Spec1 Spec2 Spec3
1 2015-09-22 00:00:08 IntelliTag      1         17        6     Alarme   -55  2423 -1085
Spec4 Spec5      Spec6 Presence Spec8 Spec9 Spec10           timeserie
1  -503   145 1442880008     0     0     0     NA 2015-09-22 00:00:08
Community
  • 1
  • 1
Mamoud
  • 57
  • 9
  • Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), including what you've already tried yourself. – Heroka Sep 28 '15 at 15:47
  • i didn't found any thing to try really, i just got this graphic by wroting this code of clustering :: library(fpc) rez<-pamk(tab$Presence) (the values of the presence column are the valures that i want to assign to groups or classes, tab is my data which i described it) plot(rez$pamobject). i just wanna that founded classes be shown but not on the component 1 and 2 but as i said – Mamoud Sep 29 '15 at 13:07

1 Answers1

0

Then don't use clusplot automagics. By its description,

clusplot uses the functions princomp and cmdscale. These functions are data reduction techniques. They will represent the data in a bivariate plot.

in other words, it projects your data automatically, and does not preserve the original coordinatr system.

It is meant to be used when you don't have coordinates at all, or too high dimensional data. It's also meant to automate as much as possible, at the cost of being less customizable. You'll have to do it the lomg way, to understand what is happening.

More precisely, it looks as if you are simply visualizing the data 0,1,2. Note that the first component explains 100% of the variance. So your data has 1 dimension, with 3 values... cluster analysis is a multivariate thing - for one-dimensional data, use other approaches.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • i didn't use the clusplot function to drat this graphic, here what i wrote :: library(fpc) rez<-pamk(tab$Presence) (the values of the presence column are the valures that i want to assign to groups or classes, tab is my data which i described it) plot(rez$pamobject) – Mamoud Sep 29 '15 at 13:03
  • Note the plot title. clusplot(...) - looks like fpc did ot for you then. I don't like the fpc package either, it is super slow. – Has QUIT--Anony-Mousse Sep 29 '15 at 19:46
  • so what do you advise me to use as a clustering algorithm for this ? – Mamoud Sep 30 '15 at 09:02
  • PAM is okay, but don't let the package do all the work for you. In particular if the output doesn't work... For example, try `cluster::pam` instead of `fpc::pamk`. And plot yourself, so you have more *control* of what is happening. – Has QUIT--Anony-Mousse Sep 30 '15 at 12:54
  • yeah but if i used cluster :: pam, then i have to define the number of clusters, which i don't know and i want that the algorithm define it itself. Plus, with pamk i could define the column that i want the clustering be about – Mamoud Oct 01 '15 at 08:20
  • Try different k yourself. Maybe that part did not work? Get more control over what is happening! What values does `presence` have? I suspect that this is the reason your results are that bad. Unless it is a *continuous* variable (it looks as if that variable contains 0,1,2 only?), you are completely doing the wrong thing. – Has QUIT--Anony-Mousse Oct 01 '15 at 11:40
  • yeah i may not had the good code but the result are fortunately good for me ! i wanted that i got finally 3 classes and that's what happend ! my column "Presence", may contain only 0, or 0 and 1, or from 0 to 10, it depend on the table i'm working on, every table is one day (working on timedate variables), and i want the algorithm to specify for me the time slot number (0 if the person is not present at home, 1 if he is prensnet from .. to .., ext for 2 and 3 if it exist....). i don't konw "k". and i want now to represent the time slot founded by the algorithm on a very nice graphic – Mamoud Oct 01 '15 at 12:33
  • for exmple, i have 3 ligns from 12am to 8am that contains 0 for the columns "Presence", then from 8am to 1pm, i got 5 ligns that contains all the value 1 for the presenc ecolumn, then 2 ligns contains 0 from 1pm to 3pm, then like 3 ligns contains the value 2 for teh presence from 3pm to 6 pm, then the rest is 0 from 6pm to 12pm, so the algorithm should found 3 classes (0, 1 and 2), then minis 1, i got my two time slot, which the person is presence at those time slot, you got me ? – Mamoud Oct 01 '15 at 13:05
  • That is not clustering, and it isn't what the clustering algorithm produces... wrong tool for this problem. You want to join rows that have the same presence value; but that isn't what PAM will do. It will put *all* the 0s in the same cluster, even when they are *not consecutive*. So if the user is present in every even hour, and absent in every odd hour, you will not have 12 time slots, but 2 (even and odd). Again: **you are not looking for clustering, but a for loop**. – Has QUIT--Anony-Mousse Oct 01 '15 at 17:08
  • ok i see what you are refaring to, and so many thanks for your time. But in your example, if it's the case of even odd hour, then i'm consider it as one time slot, cause i have already a "threshold" to know if the absence duratiuon is considred as in the presence time slot or not, so if it's gonna give just two time slot, it's ok, i don't need 12 time slot, yeah i know it can be done by a loop, but i wantd to search for more graphic if i do it with clustering ! – Mamoud Oct 02 '15 at 09:10
  • let's says for example, we have a table of bank clients with their matricul and their grade (numbers) and i want to classify them to groups that i don't know before based on the grade (the k), so i don't think a loop here gonna be usefull. it's almost the same for me, i just don't know the values in the presence column and i want to define every row to the groups founded by the algorithm. So is there a solution for such an example ? – Mamoud Oct 02 '15 at 09:13
  • and for your remark for the first component is explaining 100%, yeah i know about that and that it is a one dimensional data, so what are these other approchs that i should serach for ? – Mamoud Oct 02 '15 at 09:24
  • You are **not looking for clustering**. You are using the totally wrong tools, because you are not having a clustering problem. That is why things don't work for you. You want to *group* consecutive entries, but that is a simple *grouping* operation, not a clustering operation, sorry. Clustering is a hammer, but you need a knife. – Has QUIT--Anony-Mousse Oct 02 '15 at 10:32