3
hits <- vapply(titles,
           FUN = grepl,
           FUN.VALUE = logical(length(pass_names)),
           pass_names)

titles is a vector with titles such as "mr", pass_names is a list of names.

2 questions.

  1. I don't understand the resulting matrix hits
  2. I don't understand why the last line is pass_names nor what how I am supposed to know about these 4 arguments. Under ?vapply it specificies the x, FUN, FUN.VALUE but I cannot figure out how I am supposed to figure out that pass_names needs to be listed there.

I have looked online and could not find an answer, so I hope this will help others too. Thank you in advance for your answers, yes I am a beginner.


Extra info: This question uses the titanic package in R, pass_names is just titanic$Name, titles is just paste(",", c("Mr\\.", "Master", "Don", "Rev", "Dr\\.", "Major", "Sir", "Col", "Capt", "Jonkheer"))

Alexis Drakopoulos
  • 1,115
  • 7
  • 22

2 Answers2

1

You're right to be a bit confused.

The vapply code chunk in your question is equivalent to:

hits <- vapply(titles,
               FUN = function(x) grepl(x, pass_names),
               FUN.VALUE = logical(length(pass_names)))

vapply takes a ... argument which takes as many arguments as are provided. If the arguments are not named (see @Roland's comment), the n-th argument in the ... position is passed to the n+1-th argument of FUN (the first argument to FUN is X, i.e. titles in this case).

The resulting matrix has the same number of rows as the number of rows in titanic and has 10 columns, the length of titles. The [i, j]-th entry is TRUE if the i-th pass_names matches the j-th regular expression in titles, FALSE if it doesn't.

Hugh
  • 15,521
  • 12
  • 57
  • 100
  • `*apply` functions can do name matching for arguments in `...`. Therefore `X` must not be the first argument; e.g., `lapply(1:4, matrix, data = 1)` works. – Roland Feb 13 '18 at 14:23
  • From solving this question it is very evident to me that I just do not understand how these functions work at all. Let me try to see if I get this right. – Alexis Drakopoulos Feb 13 '18 at 14:28
0

Essentially you are passing two vectors in your vapply which is equivalent to two nested for loops. Each pairing is then passed into the required arguments of grepl: grepl(pattern, x).

Specifically, on first loop of vapply the first item in titles is compared with every item of pass_names. Then on second loop, the second item in titles is compared again to all items of pass_names and so on until first vector, titles, is exhausted.

To illustrate, you can equivalently build a hits2 matrix using nested for loops, rendering exactly as your vapply output, hits:

hits2 <- matrix(NA, nrow=length(df$name), ncol=length(titles))
colnames(hits2) <- titles

for (i in seq_along(df$name)) {

  for (j in seq_along(titles)) {

    hits2[i, j] <- grepl(pattern=titles[j], x=df$name[i])

  }

}

all.equal(hits, hits2)
# [1] TRUE

Alternatively, you can run same exact in sapply without the required FUN.VALUE argument as both sapply and vapply are wrappers to lapply. However, vapply is more preferred as you proactively assert your output while sapply renders one way depending on function. For instance, in vapply you could render an integer matrix with: FUN.VALUE = integer(length(pass_names)).

hits3 <- sapply(titles, FUN = grepl, pass_names)

all.equal(hits, hits3)
# [1] TRUE

All in all, the apply family are more concise, compact ways to run iterations and renders a data structure instead of initializing and assigning a vector/matrix with for or while loops.

For further reading, consider this interesting SO post: Is the “*apply” family really not vectorized?

Parfait
  • 104,375
  • 17
  • 94
  • 125