Compute centroid for each group using dplyr

Question

For each cluster in temp3, compute it's centroid. I would ultimately like not plot the cluster number on it's centroid's coordinates.

Data:

> head(temp3)
                             X         Y Transcripts Genes Timepoint Run Cluster
6B_0_GACCGCGATATT -102.1425877 13.944831      134028 11269     Day 0  6B       2
6B_0_ATTGCGGAGACA  -38.6617527  0.600154      106849 10947     Day 0  6B       3
6B_0_ATGGTCACCACT  -23.3275424 34.178312      105817 10495     Day 0  6B       4
6B_0_ATATTGCTAATC   -0.6069128 52.449397       79920  9650     Day 0  6B       4
6B_0_ATCTAATCTACC   -0.4738788 54.756711       72912  9294     Day 0  6B       4
6B_0_CGCAGTGTGCCC  108.5333675 76.637930       70132  9291     Day 0  6B       6

Code:

library(dplyr)
temp3 %>% group_by(Cluster) %>% mutate(., Centroid=rowMeans(cbind(.$X, .$Y), na.rm = TRUE))

Which returns:

Error: incompatible size (13792), expecting 198 (the group size) or 1

EDIT:

another approach:

library(cluster)
temp3 %>% group_by(Cluster) %>% mutate(., Centroid=pam(cbind(.$X, .$Y), 1)$medoids)

returns:

Error: incompatible size (2), expecting 198 (the group size) or 1

Relevant post: http://stackoverflow.com/questions/3505701/r-grouping-functions-sapply-vs-lapply-vs-apply-vs-tapply-vs-by-vs-aggrega and http://gis.stackexchange.com/a/6026/61922 — zx8754, Oct 26 '16 at 16:52

score 1 · Accepted Answer · answered Oct 26 '16 at 16:34

How about just

temp3 %>% group_by(Cluster) %>% mutate(meanX=mean(X), meanY=mean(Y))

if you want a result with the same dimensions as the input.

Or, if you just want one row per cluster (which seems more likely):

temp3 %>% group_by(Cluster) %>% summarise(meanX=mean(X), meanY=mean(Y))

Compute centroid for each group using dplyr

1 Answers1