3

I am running Fuzzy C-Means Clustering using e1071 package. I want to decide the optimum number of clusters based on fuzzy performance index (FPI) (extent of fuzziness) and normalized classification entropy (NCE) (degree of disorganization of specific class) given in the following formula

enter image description here

where c is the number of clusters and n is the number of observations, μik is the fuzzy membership and loga is the natural logarithm.

I am using the following code

library(e1071)
x <- rbind(matrix(rnorm(100,sd=0.3),ncol=2),
         matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
cl <- cmeans(x,2,20,verbose=TRUE,method="cmeans")
cl$membership

I have been able to extract the μik i.e. fuzzy membership. Now, cmeans has to for different number of clusters e.g. 2 to 6 and the FPI and NCE has to be calculated to have a plot like the following

enter image description here

How can it be achieved in R?

Edit

I have tried the code provided by @nya for iris dataset using the following code

df <- scale(iris[-5])

FPI <- function(cmem){
  c <- ncol(cmem)
  n <- nrow(cmem)
  
  1 - (c / (c - 1)) * (1 - sum(cmem^2) / n)
}

NCE <- function(cmem){
  c <- ncol(cmem)
  n <- nrow(cmem)
  
  (n / (n - c)) * (- sum(cmem * log(cmem)) / n)
}

# prepare variables
cl <- list()
fpi <- nce <- NULL

# cycle through the desired number of clusters
for(i in 2:6){
  cl[[i]] <- cmeans(df, i, 20, method = "cmeans")
  fpi <- c(fpi, FPI(cl[[i]]$membership))
  nce <- c(nce, NCE(cl[[i]]$membership))
}

# add space for the second axis label
par(mar = c(5,4,1,4) + .1)

# plot FPI
plot(2:6, fpi, lty = 2, pch = 18, type = "b", xlab = "Number of clusters", ylab = "FPI")

# plot NCE, manually adding the second axis
par(new = TRUE)
plot(2:6, nce, lty = 1, pch = 15, type = "b", xlab = "", ylab = "", axes = FALSE)
axis(4, at = pretty(range(nce)))
mtext("NCE", side = 4, line = 3)

# add legend
legend("top", legend = c("FPI", "NCE"), pch = c(18,15), lty = c(2,1), horiz = TRUE)

enter image description here

The minimum values of fuzzy performance index(FPI) and normalized classification entropy (NCE) were considered to decide the optimum number of clusters. NCE is always increasing and FPI is showing the decreasing value. Ideally it should have been

enter image description here

UseR10085
  • 7,120
  • 3
  • 24
  • 54

2 Answers2

5

With available equations, we can program our own functions. Here, the two functions use equations present in the paper you suggested and one of the references the authors cite.

FPI <- function(cmem, method = c("FuzME", "McBrathney", "Rahul")){
    method = match.arg(method)
    C <- ncol(cmem)
    N <- nrow(cmem)

    # Rahul et al. 2019. https://doi.org/10.1080/03650340.2019.1578345
    if(method == "Rahul"){
        res <- 1 - (C / (C - 1)) * (1 - sum(cmem^2) / N)
    }
    # McBrathney & Moore 1985 https://doi.org/10.1016/0168-1923(85)90082-6
    if(method == "McBrathney"){
        F <- sum(cmem^2) / N
        res <- 1 - (C * F - 1) / (F - 1)
    }
    # FuzME https://precision-agriculture.sydney.edu.au/resources/software/
    # MATLAB code file fvalidity.m, downloaded on 11 Nov, 2021 
    if(method == "FuzME"){
        F <- sum(cmem^2) / N
        res <- 1 - (C * F - 1) / (C - 1)
    }
    return(res)
}

NCE <- function(cmem, method = c("FuzME", "McBrathney", "Rahul")){
    method = match.arg(method)
    C <- ncol(cmem)
    N <- nrow(cmem)

    if(method == "Rahul"){
        res <- (N / (N - C)) * (- sum(cmem * log(cmem)) / N)
    }
    if(method %in% c("FuzME", "McBrathney")){
        H <- -1 / N * sum(cmem * log(cmem)) 
        res <- H / log(C)
    }
    return(res)
}

Then use those to calculate the indices from the degrees of membership from the cmeans function from the iris dataset.

# prepare variables
cl <- list()
fpi <- nce <- NULL

# cycle through the desired number of clusters
for(i in 2:6){
    cl[[i]] <- e1071::cmeans(iris[, -5], i, 20, method = "cmeans")
    fpi <- c(fpi, FPI(cl[[i]]$membership, method = "M"))
    nce <- c(nce, NCE(cl[[i]]$membership, method = "M"))
}

Last, plot with two different axes in one plot.

# add space for the second axis label
par(mar = c(5,4,1,4) + .1)

# plot FPI
plot(2:6, fpi, lty = 2, pch = 18, type = "b", xlab = "Number of clusters", ylab = "FPI")

# plot NCE, manually adding the second axis
par(new = TRUE)
plot(2:6, nce, lty = 1, pch = 15, type = "b", xlab = "", ylab = "", axes = FALSE)
axis(4, at = pretty(range(nce)))
mtext("NCE", side = 4, line = 3)

# add legend
legend("top", legend = c("FPI", "NCE"), pch = c(18,15), lty = c(2,1), horiz = TRUE)

enter image description here

EDIT1: Updated the functions according to optional equations from two different publications and calculated the example on the iris dataset.

EDIT2: Added code for the FPI and NCE calculations specified in the FuzME MATLAB code available here.

nya
  • 2,138
  • 15
  • 29
  • I have tested your code with `iris` dataset. The minimum values of fuzzy performance index(FPI) and normalized classification entropy (NCE) were considered to decide the optimum number of clusters. NCE is always increasing and FPI is showing the decreasing value. I have updated my question. Can you please see to it? – UseR10085 Nov 11 '21 at 12:12
  • @BappaDas Can you check that the functions `FPI` and `NCE` are implemented correctly? Specifically, the `c` and `n` arguments should be the number of clusters and the number of observations. Do those numbers correspond to the dimensions of `cl$membership`? I.e. for the `iris` dataset `n = 150` and is 2,3,4,5,6. – nya Nov 11 '21 at 12:28
  • Alternatively, what is the code for producing the values in the reference image in your question? – nya Nov 11 '21 at 12:36
  • I have not coded it. I have taken it from [this](https://krishi.icar.gov.in/jspui/bitstream/123456789/31927/1/Assessing%20soil_R%20Tripathi.pdf) manuscript. – UseR10085 Nov 11 '21 at 12:48
  • The equations differ from publication to publication (I checked the two specified in the edited answer). The answer now shows the alternative from McBrathney & Moore 1985. However, the software they use (FuzMe) implements both indices differently, not matching any of the publications. Do you want the FuzMe option as well? – nya Nov 11 '21 at 14:11
  • Actually I am not able to use FuzMe software in windows 10 software. If you can show FuzMe option as well, it will be great. – UseR10085 Nov 11 '21 at 14:42
  • @BappaDas The FuzME methods are now included in the code. However, note that the McBrathney & Moore paper and the MATLAB code of FuzME use the same equation for the NCE calculation. If my answer helped with your problem, please award the bounty. – nya Nov 15 '21 at 07:33
0

Hope this could help

library(dplyr)
library(ggplot2)

f <- function(cl) {
  C <- length(cl$size)
  N <- sum(cl$size)
  mu <- cl$membership
  fpi <- 1 - C / (C - 1) * (1 - sum((mu)^2) / N)
  nce <- N / (N - C) * (-sum(log(mu) * mu) / N)
  c(FPI = fpi, NCE = nce)
}

data.frame(t(rbind(
  K = 2:6,
  sapply(
    K,
    function(k) f(cmeans(x, k, 20, verbose = TRUE, method = "cmeans"))
  )
))) %>%
  pivot_longer(cols = FPI:NCE, names_to = "Index") %>%
  ggplot(aes(x = K, y = value, group = Index)) +
  geom_line(aes(linetype = Index, color = Index)) +
  geom_point() +
  scale_y_continuous(
    name = "FPI",
    sec.axis = sec_axis(~., name = "NCE")
  ) +
  theme(legend.position = "top")

enter image description here

ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81