0

I just started using R. I need to plot within cluster variance provided by K-means clustering on a data for 2 through 20 clusters.

Here is my code:

w <- numeric(20)
for (k in 2:20) {
kf <- kmeans(whs2018annexBdatscl,k,nstart=100)
w[k] <- kf$tot.withinss
}
plot(2:20,w,type = "b", lwd= 2, pch= 19, xlab="K", ylab = expression(SS[within]))

I got the error code below: Error in xy.coords(x, y, xlabel, ylabel, log) : 'x' and 'y' lengths differ

When I plot from 1:20, it worked, but I'm supposed to plot 2:20. Please what am I doing wrong.

JR0428
  • 1
  • 5
  • 1
    It should be `1:20` not `2:20` in your plot. There are 20 elements in w and so your x axis should have 20 elements, since total sum of squares and within sum of squares are same for the first step. You are starting in loop from 2, but w does contain the 1 iteration sum of squares value. – PKumar Apr 09 '21 at 13:59
  • FYI, you have a habit recently of asking questions without the ability for us to reproduce them. We don't have your data, have no idea how it is structured, and therefore have to speculate. That is contributing to your long run of unanswered questions. I strongly suggest you adapt how you ask questions to make them self-contained and reproducible; see https://stackoverflow.com/q/5963269, [mcve], and https://stackoverflow.com/tags/r/info. Thanks, and good luck. – r2evans Apr 09 '21 at 14:10

1 Answers1

0

It appears that you never assign to w[1], so just do

plot(2:20, w[-1],
     type = "b", lwd= 2, pch= 19, xlab="K", ylab = expression(SS[within]))

The rationale for the error is straight-forward: if plot(1:2, 3:4) plots two points, what should plot(c(1,2,3), c(4,5)) plot? The vectors need to be the same length, and this is one area in R where it does not "recycle" its arguments (for better or worse).

r2evans
  • 141,215
  • 6
  • 77
  • 149