I am working with the R programming language.
I trying to scrape the name, address and phone numbers of the pizza stores on this website :
- https://www.yellowpages.ca/search/si/2/pizza/Canada
- https://www.yellowpages.ca/search/si/2/pizza/Canada
- https://www.yellowpages.ca/search/si/3/pizza/Canada
- https://www.yellowpages.ca/search/si/4/pizza/Canada
- etc.
Using the answer provided here (R: Webscraping Pizza Shops - "read_html" not working?), I learned how to write the following function to perform this task:
library(tidyverse)
library(rvest)
scraper <- function(url) {
page <- url %>%
read_html()
tibble(
name = page %>%
html_elements(".jsListingName") %>%
html_text2(),
address = page %>%
html_elements(".listing__address--full") %>%
html_text2()
)
}
scraper("https://www.yellowpages.ca/search/si/2/pizza/Canada")
Now, I would like to include the phone number for each of these pizza shops.
Looking at the source code of this website, I see that this information is included within a <h4>
tag:
But I am not sure how I can include this specification in the existing webscraping code.
Can someone please show me how to do this?
Thanks!