0

I just found a quite weird behaviour when subsetting a data.frame with a condition. The problem is, when I use a leveled variable to subset, the subset gets filtered for all levels.

Here is an example where the error does not occur:

data <- data.frame(names = c("Bob", "Alice", "Joe"), ages = c(18, 20, 43), sizes = c(180, 160, 176), group = c("0021", "9430", "0021"))

for(i in 1:length(data$names)){
  print(paste("Name: ", data$names[i], sep=""))

  # print out all group members
  print("Group Members:")
  group = subset.data.frame(data, data$group == data$group[i])
  for(j in 1:length(group$names)){
    print(paste("Name: ", group$names[j], sep=""))
  }
  print("---------------------------------")
}

Now I am saving the data$group[i] into a variable and the data.frame does not get filtered at all:

data <- data.frame(names = c("Bob", "Alice", "Joe"), ages = c(18, 20, 43), sizes = c(180, 160, 176), group = c("0021", "9430", "0021"))

for(i in 1:length(data$names)){
  print(paste("Name: ", data$names[i], sep=""))
  group <- data$group[i]

  # print out all group members
  print("Group Members:")
  group = subset.data.frame(data, data$group == group)
  for(j in 1:length(group$names)){
    print(paste("Name: ", group$names[j], sep=""))
  }
  print("---------------------------------")
}

Can someone please explain to me, why this unexpected behaviour occurs? I do expect to get a String from the data$group[i] expression but get a leveled something.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • 1
    The problem is your (wrong) use of `subset`. You should carefully study `help("subset")` including the examples and then stop using it. Use `data[data$group == group,]` instead. That does the scoping you expect. – Roland Nov 27 '19 at 08:01
  • @Roland thank you for the answer. I figured out, that `as.character(data$group)[i]` does convert the factor "value" to a string. But thanks for the hint, I am going to find out. – David Adamson Nov 27 '19 at 08:06
  • This has nothing to do with the `factor` class. This is an issue with `subset`'s non-standard evaluation and the scoping following from it. `subset` takes `group` from `data` and not from the enclosing environment. You also should not use `$` within `subset`. – Roland Nov 27 '19 at 08:09

1 Answers1

1

Nothing is wrong as such in your code. subset has scoping issues when you have variable name same as your column name. If you change your group variable to any other name it would work fine.

for(i in 1:length(data$names)){
  print(paste("Name: ", data$names[i], sep=""))
  temp <- data$group[i] #Change here

  # print out all group members
  print("Group Members:")
  group = subset(data, group == temp)
  for(j in 1:length(group$names)){
     print(paste("Name: ", group$names[j], sep=""))
  }
  print("---------------------------------")
}

and that is the same reason why you should read Why is `[` better than `subset`?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213