7

How do I combine the tapply command with 'not in' logic?

Objective: Obtain the median sepal length for each species.

tapply(iris$Sepal.Length, iris$Species, median)

Constraint: Remove entries for which there is a petal width of 1.3 and 1.5.

!iris$Petal.Width %in% c('1.3', '1.5')

Attempt:

tapply(iris$Sepal.Length, iris$Species, median[!iris$Petal.Width %in% c('1.3', '1.5')])

Result: error message 'object of type 'closure' is not subsettable'.

---

My attempt here with the iris dataset is a stand-in demo for my own dataset. I have attempted the same approach with my own dataset and received the same error message. I imagine something is wrong with my syntax. What is it?

bubbalouie
  • 643
  • 3
  • 10
  • 18
  • `median[!iris$Petal.Width %in% c('1.3', '1.5')]` you are subsetting a function here. This yields in an error. You cant use [ ] on functions. – maRtin May 11 '15 at 21:35

2 Answers2

9

Try

with(iris[!iris$Petal.Width %in% c('1.3', '1.5'),], tapply(Sepal.Length, Species, median))
# setosa versicolor  virginica 
#    5.0        5.8        6.5 

The idea here is to operate on the subset-ted data in the first place.

Your line didn't work because the FUN argument should be applied on X (Sepal.Length in your case) rather over the whole data set.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
1

This is the workaround you should not do:

tapply(
  1:nrow(iris),
  iris$Species,
  function(i) median(iris$Sepal.Length[
     (1:nrow(iris) %in% i) &
    !(iris$Petal.Width %in% c('1.3', '1.5'))
]))

Things get ugly if you subset after splitting the vector in this way. You effectively have to

  • split it again (when using 1:nrow(iris) %in% i) and
  • compute the subset once for each value of iris$Species (when using !(iris$Petal.Width %in% c('1.3', '1.5'))).
Frank
  • 66,179
  • 8
  • 96
  • 180