How to calculate fuzzy performance index and normalized classification entropy in R

Question

I am running Fuzzy C-Means Clustering using e1071 package. I want to decide the optimum number of clusters based on fuzzy performance index (FPI) (extent of fuzziness) and normalized classification entropy (NCE) (degree of disorganization of specific class) given in the following formula

where c is the number of clusters and n is the number of observations, μ_ik is the fuzzy membership and log_a is the natural logarithm.

I am using the following code

library(e1071)
x <- rbind(matrix(rnorm(100,sd=0.3),ncol=2),
         matrix(rnorm(100,mean=1,sd=0.3),ncol=2))
cl <- cmeans(x,2,20,verbose=TRUE,method="cmeans")
cl$membership

I have been able to extract the μ_ik i.e. fuzzy membership. Now, cmeans has to for different number of clusters e.g. 2 to 6 and the FPI and NCE has to be calculated to have a plot like the following

How can it be achieved in R?

Edit

I have tried the code provided by @nya for iris dataset using the following code

df <- scale(iris[-5])

FPI <- function(cmem){
  c <- ncol(cmem)
  n <- nrow(cmem)
  
  1 - (c / (c - 1)) * (1 - sum(cmem^2) / n)
}

NCE <- function(cmem){
  c <- ncol(cmem)
  n <- nrow(cmem)
  
  (n / (n - c)) * (- sum(cmem * log(cmem)) / n)
}

# prepare variables
cl <- list()
fpi <- nce <- NULL

# cycle through the desired number of clusters
for(i in 2:6){
  cl[[i]] <- cmeans(df, i, 20, method = "cmeans")
  fpi <- c(fpi, FPI(cl[[i]]$membership))
  nce <- c(nce, NCE(cl[[i]]$membership))
}

# add space for the second axis label
par(mar = c(5,4,1,4) + .1)

# plot FPI
plot(2:6, fpi, lty = 2, pch = 18, type = "b", xlab = "Number of clusters", ylab = "FPI")

# plot NCE, manually adding the second axis
par(new = TRUE)
plot(2:6, nce, lty = 1, pch = 15, type = "b", xlab = "", ylab = "", axes = FALSE)
axis(4, at = pretty(range(nce)))
mtext("NCE", side = 4, line = 3)

# add legend
legend("top", legend = c("FPI", "NCE"), pch = c(18,15), lty = c(2,1), horiz = TRUE)

The minimum values of fuzzy performance index(FPI) and normalized classification entropy (NCE) were considered to decide the optimum number of clusters. NCE is always increasing and FPI is showing the decreasing value. Ideally it should have been

nya · Accepted Answer · 2021-11-15T07:34:34.657

With available equations, we can program our own functions. Here, the two functions use equations present in the paper you suggested and one of the references the authors cite.

FPI <- function(cmem, method = c("FuzME", "McBrathney", "Rahul")){
    method = match.arg(method)
    C <- ncol(cmem)
    N <- nrow(cmem)

    # Rahul et al. 2019. https://doi.org/10.1080/03650340.2019.1578345
    if(method == "Rahul"){
        res <- 1 - (C / (C - 1)) * (1 - sum(cmem^2) / N)
    }
    # McBrathney & Moore 1985 https://doi.org/10.1016/0168-1923(85)90082-6
    if(method == "McBrathney"){
        F <- sum(cmem^2) / N
        res <- 1 - (C * F - 1) / (F - 1)
    }
    # FuzME https://precision-agriculture.sydney.edu.au/resources/software/
    # MATLAB code file fvalidity.m, downloaded on 11 Nov, 2021 
    if(method == "FuzME"){
        F <- sum(cmem^2) / N
        res <- 1 - (C * F - 1) / (C - 1)
    }
    return(res)
}

NCE <- function(cmem, method = c("FuzME", "McBrathney", "Rahul")){
    method = match.arg(method)
    C <- ncol(cmem)
    N <- nrow(cmem)

    if(method == "Rahul"){
        res <- (N / (N - C)) * (- sum(cmem * log(cmem)) / N)
    }
    if(method %in% c("FuzME", "McBrathney")){
        H <- -1 / N * sum(cmem * log(cmem)) 
        res <- H / log(C)
    }
    return(res)
}

Then use those to calculate the indices from the degrees of membership from the cmeans function from the iris dataset.

# prepare variables
cl <- list()
fpi <- nce <- NULL

# cycle through the desired number of clusters
for(i in 2:6){
    cl[[i]] <- e1071::cmeans(iris[, -5], i, 20, method = "cmeans")
    fpi <- c(fpi, FPI(cl[[i]]$membership, method = "M"))
    nce <- c(nce, NCE(cl[[i]]$membership, method = "M"))
}

Last, plot with two different axes in one plot.

# add space for the second axis label
par(mar = c(5,4,1,4) + .1)

# plot FPI
plot(2:6, fpi, lty = 2, pch = 18, type = "b", xlab = "Number of clusters", ylab = "FPI")

# plot NCE, manually adding the second axis
par(new = TRUE)
plot(2:6, nce, lty = 1, pch = 15, type = "b", xlab = "", ylab = "", axes = FALSE)
axis(4, at = pretty(range(nce)))
mtext("NCE", side = 4, line = 3)

# add legend
legend("top", legend = c("FPI", "NCE"), pch = c(18,15), lty = c(2,1), horiz = TRUE)

EDIT1: Updated the functions according to optional equations from two different publications and calculated the example on the iris dataset.

EDIT2: Added code for the FPI and NCE calculations specified in the FuzME MATLAB code available here.

I have tested your code with `iris` dataset. The minimum values of fuzzy performance index(FPI) and normalized classification entropy (NCE) were considered to decide the optimum number of clusters. NCE is always increasing and FPI is showing the decreasing value. I have updated my question. Can you please see to it? — UseR10085, Nov 11 '21 at 12:12
@BappaDas Can you check that the functions `FPI` and `NCE` are implemented correctly? Specifically, the `c` and `n` arguments should be the number of clusters and the number of observations. Do those numbers correspond to the dimensions of `cl$membership`? I.e. for the `iris` dataset `n = 150` and is 2,3,4,5,6. — nya, Nov 11 '21 at 12:28
Alternatively, what is the code for producing the values in the reference image in your question? — nya, Nov 11 '21 at 12:36
I have not coded it. I have taken it from [this](https://krishi.icar.gov.in/jspui/bitstream/123456789/31927/1/Assessing%20soil_R%20Tripathi.pdf) manuscript. — UseR10085, Nov 11 '21 at 12:48
The equations differ from publication to publication (I checked the two specified in the edited answer). The answer now shows the alternative from McBrathney & Moore 1985. However, the software they use (FuzMe) implements both indices differently, not matching any of the publications. Do you want the FuzMe option as well? — nya, Nov 11 '21 at 14:11
Actually I am not able to use FuzMe software in windows 10 software. If you can show FuzMe option as well, it will be great. — UseR10085, Nov 11 '21 at 14:42
@BappaDas The FuzME methods are now included in the code. However, note that the McBrathney & Moore paper and the MATLAB code of FuzME use the same equation for the NCE calculation. If my answer helped with your problem, please award the bounty. — nya, Nov 15 '21 at 07:33

score 0 · Answer 2 · answered Nov 11 '21 at 11:29

Hope this could help

library(dplyr)
library(ggplot2)

f <- function(cl) {
  C <- length(cl$size)
  N <- sum(cl$size)
  mu <- cl$membership
  fpi <- 1 - C / (C - 1) * (1 - sum((mu)^2) / N)
  nce <- N / (N - C) * (-sum(log(mu) * mu) / N)
  c(FPI = fpi, NCE = nce)
}

data.frame(t(rbind(
  K = 2:6,
  sapply(
    K,
    function(k) f(cmeans(x, k, 20, verbose = TRUE, method = "cmeans"))
  )
))) %>%
  pivot_longer(cols = FPI:NCE, names_to = "Index") %>%
  ggplot(aes(x = K, y = value, group = Index)) +
  geom_line(aes(linetype = Index, color = Index)) +
  geom_point() +
  scale_y_continuous(
    name = "FPI",
    sec.axis = sec_axis(~., name = "NCE")
  ) +
  theme(legend.position = "top")

How to calculate fuzzy performance index and normalized classification entropy in R

2 Answers2