2

I want to do cluster analysis of certain columns (variables), say var 5-var10. For that I used pvclust in R. Now, I want to add this column of clusters into the actual dataframe. Can anybody please help me to fix this problem. The code I used is given below:

group <- sqldf("select cq14x1_1,cq14x1_2,cq14x1_3,cq14x1_4,cq14x1_5,cq14x1_6,cq14x1_7, from parma_1")
fit_1 <- pvclust(group,method.hclust="ward",method.dist="euclidean")
group_2 <- (fit_1,alpha=.90)
Argalatyr
  • 4,639
  • 3
  • 36
  • 62
Beta
  • 1,638
  • 5
  • 33
  • 67
  • 2
    Reading the help files for `pvclust` in package `pvclust`, it seems to me that `pvclust` calculates the p-values for clustering. The underlying clustering is actually done using `hclust`. See `?hclust` and its examples for help on how to do hierarchical cluster analysis. – Andrie Jun 11 '11 at 07:18
  • -1 for using sqldf for stuff which can be made trivially and way faster using base R ;-) – mbq Jun 12 '11 at 16:14
  • 1
    I use sqldf as I'm more comfortable using sql queries. I don't know how could you put negative marking for somebodies preference? – Beta Jun 13 '11 at 14:19

2 Answers2

0

The output of the pvclust function is an object which contains an hclust element (check out section Value). The hclust is basically a tree representation of the clustering (described here), and can be fed further into the cutree function which produces group memeberships. Have a look at the doc page of cutree. You need these 3 functions to produce actual cluster memberships of your original data which can then be easily added to your dataframe as @nico suggested.

davidski
  • 561
  • 1
  • 4
  • 16
0

If the problem is adding a column to a dataframe, just use:

yourdataframe <- cbind(yourdataframe, newcolumn)

If that's not your problem, try clarifying the question.

nico
  • 50,859
  • 17
  • 87
  • 112
  • Actually I want to add the column that is defining the new cluster into the main dataset. And I was using the cbind option. But unfortunately it's ging error. newdataset = cbind(group, group_2) Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 199, 0 – Beta Jun 13 '11 at 15:36
  • @user697363: well... you're not assigning anything to `group2` you put some parentheses in the code but you're not calling any function... so it has length 0 and you cannot `cbind` it to the data frame. – nico Jun 13 '11 at 15:43
  • Can you please tell me how to rectify this problem. I want to add a new column in the present dataset, where the new column contains the clusters from pvclust. I can do it using hclust. But as I mentioned above I want to use pvclust rather than hclust. – Beta Jun 13 '11 at 15:53
  • The culprit is `group_2 <- (fit_1,alpha=.90)`... I'm not sure what that's supposed to do... I never used `pvclust`, but maybe you want to check the result of `str(fit_1)`. I assume `fit_1` will contain information about clusters. – nico Jun 13 '11 at 17:19