2

I am using rvest to get the hyperlinks in a Google search. User @AllanCameron helped me in the past to sketch this code but now I do not know how to change the xpath or what I need to do in order to get the links. Here my code:

library(rvest)
library(tidyverse)
#Code
#url
url <- 'https://www.google.com/search?q=Mario+Torres+Mexico'
#Get data
first_page <- read_html(url)
links <- html_nodes(first_page, xpath = "//div/div/a/h3") %>% 
  html_attr('href')

Which entirely returns NA.

I would like to get the links for each item that appears like next (sorry for the quality of images):

enter image description here

enter image description here

Is possible to get that stored in a dataframe? Many thanks!

user007
  • 347
  • 1
  • 3
  • 12

1 Answers1

3

Look at the parents a of the h3 nodes and find their href attribute. This ensures you have the same number of links as the main titles, to allow for easy arrangement in a dataframe.

titles <- html_nodes(first_page, xpath = "//div/div/a/h3")

titles %>%
  html_elements(xpath = "./parent::a") %>%
  html_attr("href") %>%
  str_extract("https.*?(?=&)")

[1] "https://www.linkedin.com/in/mario-torres-b5796315b"                                                           
[2] "https://mariolopeztorres.com/"                                                                                
[3] "https://www.instagram.com/mario_torres25/%3Fhl%3Den"                                                          
[4] "https://www.1stdibs.com/buy/mario-torres-lopez/"                                                              
[5] "https://m.facebook.com/2064681987175832"                                                                      
[6] "https://www.facebook.com/mariotorresmx"                                                                       
[7] "https://www.transfermarkt.us/mario-torres/profil/spieler/28167"                                               
[8] "https://en.wikipedia.org/wiki/Mario_Garc%25C3%25ADa_Torres"                                                   
[9] "https://circawho.com/press-and-magazines/mario-lopez-torress-legacy-is-still-being-woven-in-michoacan-mexico/"
dcsuka
  • 2,922
  • 3
  • 6
  • 27
  • Fantastic! I have upvoted! Let's wait a few if there is another option otherwise I will accept yours! Many thanks! – user007 Sep 21 '22 at 23:32
  • @dcsuke Many thanks for the help. I just wonder if you could help me with a heavy issue I had trying to crawl some page. I await for your kind response. Kind regards! – user007 Oct 26 '22 at 15:45
  • Many thanks for the help. I just wonder if you could help me with a heavy issue I had trying to crawl some page. I await for your kind response. Kind regards! – user007 Oct 26 '22 at 16:17
  • Sure, just ask a question and link it. – dcsuka Oct 26 '22 at 17:30
  • Hi many thanks again this is the question. It would be great if you as expert can help me! https://stackoverflow.com/questions/74214823/my-rselenium-code-is-not-able-to-find-web-elements – user007 Oct 26 '22 at 22:20