I am trying to scrape name/address information from yellowpages (https://www.yellowpages.ca/). I have a function (from :(R) Webscraping Error : arguments imply differing number of rows: 1, 0) that is able to retrieve this information:
library(rvest)
library(dplyr)
scraper <- function(url) {
page <- url %>%
read_html()
tibble(
name = page %>%
html_elements(".jsListingName") %>%
html_text2(),
address = page %>%
html_elements(".listing__address--full") %>%
html_text2()
)
}
However, sometimes the address information is not always present. For example : there are several barbers listed on this page https://www.yellowpages.ca/search/si/1/barber/Sudbury+ON and they all have addresses except one of them. As a result, when I run this function, I get the following error:
scraper("https://www.yellowpages.ca/search/si/1/barber/Sudbury+ON")
Error:
! Tibble columns must have compatible sizes.
* Size 14: Existing data.
* Size 12: Column `address`.
i Only values of size one are recycled.
Run `rlang::last_error()` to see where the error occurred.
My Question: Is there some way that I can modify the definition of the "scraper" function in such a way, such that when no address is listed, an NA appears in that line? For example:
barber address
1 barber111 address111
2 barber222 address222
3 barber333 NA
Is there some way I could add a statement similar to CASE WHEN
that would grab the address or place an NA when the address is not there?