0

I am working on a web scraping project using rvest. I have found useful posts about the task I am conducting but I am not getting the expected output. Basically, I want to get the names from titles after a search is done in google. For that I use next code (based on this post):

Web Scraping Google Result with R

library(rvest)
library(tidyverse)
#Code
#url
url <- 'https://www.google.com/search?q=Mario+Torres+Mexico'
#Get data
first_page <- read_html(url)
titles <- html_nodes(first_page, xpath = "//div/div/div/a/div[not(div)]") %>% 
  html_text()

Which works and returns this:

titles
 [1] "www.facebook.com › Pages › Public figure › Artist"     
 [2] "mx.linkedin.com › mario-torres-84ab9b1b"               
 [3] "mx.linkedin.com › ingmariotorres"                      
 [4] "sic.cultura.gob.mx › ficha"                            
 [5] "www.meer.com › authors › 826-mario-torres-dujisin"     
 [6] "www.transfermarkt.es › mario-torres › profil › spieler"
 [7] "www.espn.com.ec › mma › peleador › mario-torres"       
 [8] "twitter.com › matorresr"                               
 [9] "es.wikipedia.org › wiki › Jaime_Torres_Bodet"          
[10] "www.instagram.com › mario_torres25"  

But, I do not know if it is possible to extract the names below each web link. Graphically, these (only highlighted the two first as example, but it should be all the ten titles similar to previous output):

enter image description here

Is that possible, many thanks!

Edit: Is it possible to extract the text framed in red?

enter image description here

user007
  • 347
  • 1
  • 3
  • 12

1 Answers1

1

Google searches change according to locale and also over time, so the list I get is different from yours. However, the xpath should be the same:

html_nodes(first_page, xpath = "//div/div/div/a/h3") %>% html_text()
#> [1] "Mario García Torres - Wikipedia"                              
#> [2] "Mario Torres (@mario_torres25) • Instagram photos and videos" 
#> [3] "Mario Torres - Regional manager Mexico and Central America"   
#> [4] "Mario Lopez Torres - A Furniture And Art Experience"          
#> [5] "Mario García Torres | The Guggenheim Museums and Foundation"  
#> [6] "Mario Torres - Player profile | Transfermarkt"                
#> [7] "Mario Torres Lopez - 33 For Sale on 1stDibs - 1stDibs"        
#> [8] "Mario Lopez Torres - 12 For Sale at 1stdibs"                  
#> [9] "Mario Lopez Torres Furniture | On the Town, Hispanic Heritage"
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Many thanks for your answer, maybe could you explain me where I can see that xpath after inspecting the web page? It is difficult for me to find it! – user007 Jul 31 '22 at 12:38
  • 1
    I did the xpath manually, but if you select the relevant node in the inspector, right click on it in the inspection pane, then select "Copy >" , it should open a drop-down that includes xpath. (this is in Firefox, but I'm guessing Chrome is similar) – Allan Cameron Jul 31 '22 at 12:44
  • Hi @AllanCameron I hope you are fine. I know you have answered my question but a similar issue appeared. I also need to extract the text added in the edit (circled in red). I have inspected the page but I have not success on finding the xpath. Could you please help me with that? Many thanks! – user007 Aug 01 '22 at 19:59
  • Hi @AllanCameron I have created a new question for what I asked early. If you could help I will accept as soon as possible many thanks https://stackoverflow.com/questions/73215569/how-to-retrieve-text-below-titles-from-google-search-using-rvest – user007 Aug 03 '22 at 02:07
  • Hi @AllanCameron hope you are fine. If you have time could you please help with a check on this answer? https://stackoverflow.com/questions/73215569/how-to-retrieve-text-below-titles-from-google-search-using-rvest – user007 Aug 04 '22 at 15:16
  • Hi @AllanCameron there is a bounty for the question if you are interested https://stackoverflow.com/questions/73215569/how-to-retrieve-text-below-titles-from-google-search-using-rvest – user007 Aug 05 '22 at 11:20
  • @user007 Thanks. I have had a chance to look at this, and have posted an answer on your question. – Allan Cameron Aug 05 '22 at 12:43