I'm trying to scrape multiple pages

Question

I'm trying to scrape multiple pages from the same website from a gaming website for reviews.

I tried running it and altering the code I found on here: R web scraping across multiple pages with the one of the answers.

library(tidyverse)
library(rvest)

url_base <- "https://www.metacritic.com/browse/games/score/metascore/all/ps4?sort=desc&page=0"

map_df(1:17, function(i) {


  cat(".")

 pg <- read_html(sprintf(url_base, i))

data.frame(Name = html_text(html_nodes(pg,"#main .product_title a")),
         MetaRating = as.numeric(html_text(html_nodes(pg,"#main .positive"))),
         UserRating = as.numeric(html_text(html_nodes(pg,"#main .textscore"))),
         stringsAsFactors = FALSE)

}) -> ps4games_metacritic

The results is the first page is being scraped 17 times, instead of the 17 pages on the website

If you look at the answer you linked, you see that pagenumber was replaced by `%d`. So in your case you scrape page number 0, 17 times. Try `url_base <- "https://www.metacritic.com/browse/games/score/metascore/all/ps4?sort=desc&page=%d"` instead. — Tonio Liebrand, Oct 10 '19 at 09:30
Possible duplicate of [R web scraping across multiple pages](https://stackoverflow.com/questions/36683510/r-web-scraping-across-multiple-pages) — Tonio Liebrand, Oct 10 '19 at 09:30

score 0 · Answer 1 · edited May 17 '21 at 14:24

I have made three changes to your code:

since their page numbering starts at 0, map_df(1:17... should be map_df(0:16...
as proposed by BigDataScientist, url_base should be set like this: url_base <- "https://www.metacritic.com/browse/games/score/metascore/all/ps4?sort=desc&page=%d"
if you use "#main .positive" you will get an error while scraping the 7th page, since games without positive scorese start there - unless you only want to scrape games with positive evaluations (which would mean a bit different code) you should use "#main .game" instead

    library(tidyverse)
    library(rvest)
    
    url_base <- "https://www.metacritic.com/browse/games/score/metascore/all/ps4?sort=desc&page=%d"
    
    map_df(0:16, function(i) {
      
      
      cat(".")
      pg <- read_html(sprintf(url_base, i))
    
      data.frame(Name = html_text(html_nodes(pg,"#main .product_title a")),
                 MetaRating = as.numeric(html_text(html_nodes(pg,"#main .game"))),
                 UserRating = as.numeric(html_text(html_nodes(pg,"#main .textscore"))),
                 stringsAsFactors = FALSE)
      
    }) -> ps4games_metacritic

I'm trying to scrape multiple pages

1 Answers1