0

I want to execute a function in R that comes from the following textbook (on p.20, but I posted it below): media.readthedocs.org/pdf/little-book-of-r-for-multivariate-analysis/latest/little-book-of-r-for-multivariate-analysis.pdf

The dataset I'm trying it on (the dataset used in this PDF) can be found here:

wine <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data",
                   sep=",")

The function is first defined as follows, and then executed (last line):

calcBetweenGroupsVariance <- function(variable,groupvariable)
{
# find out how many values the group variable can take
groupvariable2 <- as.factor(groupvariable[[1]])
levels <- levels(groupvariable2)
numlevels <- length(levels)
# calculate the overall grand mean:
grandmean <- mean(variable)
# get the mean and standard deviation for each group:
numtotal <- 0
denomtotal <- 0
for (i in 1:numlevels)
{
leveli <- levels[i]
levelidata <- variable[groupvariable==leveli,]
levelilength <- length(levelidata)
# get the mean and standard deviation for group i:
meani <- mean(levelidata)
sdi <- sd(levelidata)
numi <- levelilength * ((meani - grandmean)^2)
denomi <- levelilength
numtotal <- numtotal + numi
denomtotal <- denomtotal + denomi
}
# calculate the between-groups variance
Vb <- numtotal / (numlevels - 1)
Vb <- Vb[[1]]
return(Vb)
}
calcBetweenGroupsVariance (wine[2],wine[1])

It should give me the between groups variance for the variable "V2" (second column) based on the three labels (first column). Unfortunately, R tells me:

enter image description here

The structure of the dataset looks like this:

enter image description here

I don't know how to solve this. According to str(), the second column contains numerical data. I tried this function also on another dataset with the same issue. I searched upon this error message and there a quite a few topics based on it, but I can't establish any analogy to my problem.

If someone could give me a hint what to do, I would be very gratefule! If you need more information, please tell me.

Thanks a lot in advance,

Wilson Vargas
  • 2,841
  • 1
  • 19
  • 28
SCW16
  • 403
  • 2
  • 4
  • 10
  • Using `wine[1]` and `wine[2]` is probably not what you want. Try `wine[[1]]` and `wine[[2]]` – MrFlick Oct 10 '17 at 21:55
  • Likely duplicate of: https://stackoverflow.com/questions/1169456/the-difference-between-and-notations-for-accessing-the-elements-of-a-lis – MrFlick Oct 10 '17 at 21:56
  • Also it should be `groupvariable2 <- as.factor(groupvariable)` and `levelidata <- variable[groupvariable==leveli]` – MrFlick Oct 10 '17 at 22:00
  • @MrFlick: I followed your 3 suggestions and now it works fine.Thanks a lot for your answers! – SCW16 Oct 11 '17 at 15:38

2 Answers2

0

try adding na.rm = TRUE to your grandmean <- mean(variable)

Garrett
  • 106
  • 5
0

It looks like the authors of the book made some uncommon decisions about how to pass parameters to functions. In cases like this, it makes more sense (and is more generally useful) if you pass in a vector of data rather than requiring a user to pass in an entire data.frame. So here's a change both to the function itself and to how it's called that should get it to run.

calcBetweenGroupsVariance <- function(variable, groupvariable) {
  # find out how many values the group variable can take
  groupvariable2 <- as.factor(groupvariable)
  levels <- levels(groupvariable2)
  numlevels <- length(levels)
  # calculate the overall grand mean:
  grandmean <- mean(variable)
  # get the mean and standard deviation for each group:
  numtotal <- 0
  denomtotal <- 0
  for (i in 1:numlevels)
  {
    leveli <- levels[i]
    levelidata <- variable[groupvariable==leveli]
    levelilength <- length(levelidata)
    # get the mean and standard deviation for group i:
    meani <- mean(levelidata)
    sdi <- sd(levelidata)
    numi <- levelilength * ((meani - grandmean)^2)
    denomi <- levelilength
    numtotal <- numtotal + numi
    denomtotal <- denomtotal + denomi
  }
  # calculate the between-groups variance
  Vb <- numtotal / (numlevels - 1)
  Vb <- Vb[[1]]
  return(Vb)
}

and then call it with

calcBetweenGroupsVariance (wine[[2]], wine[[1]])
# or 
calcBetweenGroupsVariance (wine$V2, wine$V1)
MrFlick
  • 195,160
  • 17
  • 277
  • 295