I am working on a web scraping project using rvest
. I have found useful posts about the task I am conducting but I am not getting the expected output. Basically, I want to get the names from titles after a search is done in google. For that I use next code (based on this post):
Web Scraping Google Result with R
library(rvest)
library(tidyverse)
#Code
#url
url <- 'https://www.google.com/search?q=Mario+Torres+Mexico'
#Get data
first_page <- read_html(url)
titles <- html_nodes(first_page, xpath = "//div/div/div/a/div[not(div)]") %>%
html_text()
Which works and returns this:
titles
[1] "www.facebook.com › Pages › Public figure › Artist"
[2] "mx.linkedin.com › mario-torres-84ab9b1b"
[3] "mx.linkedin.com › ingmariotorres"
[4] "sic.cultura.gob.mx › ficha"
[5] "www.meer.com › authors › 826-mario-torres-dujisin"
[6] "www.transfermarkt.es › mario-torres › profil › spieler"
[7] "www.espn.com.ec › mma › peleador › mario-torres"
[8] "twitter.com › matorresr"
[9] "es.wikipedia.org › wiki › Jaime_Torres_Bodet"
[10] "www.instagram.com › mario_torres25"
But, I do not know if it is possible to extract the names below each web link. Graphically, these (only highlighted the two first as example, but it should be all the ten titles similar to previous output):
Is that possible, many thanks!
Edit: Is it possible to extract the text framed in red?