While trying to scrape information from several links, I got the error: Error in open.connection(x, "rb") : HTTP error 404.
I feel like it has something to do with the first part of my for-loop, so I tried changing numbers
from character to numeric, but that did not fix the problem. I also tried advice here, however, it returned more problems.
Think you can spot where I went wrong?
library(rvest)
library(tidyverse)
pageMen = read_html('https://www.bjjcompsystem.com/tournaments/1869/categories')
get_links <- pageMen %>%
html_nodes('.categories-grid__category a') %>%
html_attr('href') %>%
paste0('https://www.bjjcompsystem.com', .)
# extract numerical part of link
numbers = str_sub(get_links, - 7, - 1)
numbers = as.numeric(numbers)
## create empty vector ----------------------------
master1.tree = data.frame()
## Create for loop ---------------------------------
for (i in length(numbers)){
url <- read_html(paste0('https://www.bjjcompsystem.com/tournaments/1869/categories/', i))
ageDivision <- url %>% html_nodes('.category-title__age-division') %>% html_text()
gender <- url %>% html_nodes('.category-title__age-division+ .category-title__label') %>% html_text()
matches = data.frame('division' = ageDivision,'gender' = gender)
master1.tree <- rbind(master1.tree, data.frame(matches))
}
I also ran this, but it did not return the data frame for the scraped data. Instead it printed the results on the screen instead
map_df(get_links, function(i){
url <- read_html(i)
matches <- data.frame(ageDivision <- url %>%
html_nodes('.category-title__age-division') %>% html_text(),
gender <- url %>% html_nodes('.category-title__age-division+ .category-title__label') %>% html_text() )
master1.tree <- rbind(master1.tree, matches)
})