Essentially, each member of the apply family by default either returns:
- a simplified object (vector, matrix, array) where all elements are the same atomic type such as logical, integer, double, complex, raw;
- a non-simplified object (data frame, list) where each element are not necessarily the same type and can include complex, class objects.
To adequately translate your for
loop into an apply-family function you must first ask what is the input type and desired output type? Because read_html
returns a special class object of XML types, it does not adequately fit an atomic vector or matrix. Therefore, lapply
would be the best for
loop translation here. However, its siblings could work with various changes to defaults or inputs:
lapply
lapply(urls, read_html)
apply (requires at least a 2-dimension input such as matrix or array):
apply(matrix(urls), 1, read_html)
sapply (wrapper to lapply
but requires simplify
argument)
sapply(urls, read_html, simplify=FALSE)
by (object-oriented wrapper to tapply
)
by(urls, urls, function(x) read_html(as.character(x)))
mapply (requires SIMPLIFY
argument which is equivalent to wrapper, Map
)
mapply(read_html, urls, SIMPLIFY = FALSE)
Map(read_html, urls)
rapply (requires nested list transformation, with list output)
urls_list <- list(u1 = urls[1], u2 = urls[2])
rapply(urls_list, read_html, how="list")
Below functions will not work due to defaults restricted to simplified types where ?
references external pointers.
sapply (default setting)
sapply(urls, read_html)
# https://www.r-bloggers.com https://www.stackoverflow.com
# node ? ?
# doc ? ?
vapply (usually only returns simplified objects)
vapply(urls, read_html, vector(mode="list", length=2))
# https://www.r-bloggers.com https://www.stackoverflow.com
# node ? ?
# doc ? ?
mapply (default setting)
mapply(read_html, urls)
# https://www.r-bloggers.com https://www.stackoverflow.com
# node ? ?
# doc ? ?
rapply
rapply(urls_list, read_html)
# $u1.node
# <pointer: 0x8638eb0>
# $u1.doc
# <pointer: 0x6f79b30>
# $u2.node
# <pointer: 0x9c98930>
# $u2.doc
# <pointer: 0x9cb19a0>
See below SO post for further reading:
Grouping functions (tapply, by, aggregate) and the *apply family