I am trying to parse all sitemaps in a sitemap index. I was able to create an object x which has all the three sitemaps from the index.
I am able to create a separate object for each nested xml and then rbind() it together but I believe a function would be easier. I tried writing a for loop or using sapply but it returns error as I am trying to pass a list of lists into sapply.
My aim is to take all of the xml_children and assign them to a dataframe, as doing it my way over a 50 xml list would be very daunting.
sitemap_index <- read_xml("https://www.bodystore.com/sitemap_index.xml")
sitemap_urls <- xml_children(sitemap_index) %>% xml_to_dataframe() %>% rename (url = loc)
x is contaiting all the urls from the sitemap index
x <- lapply(sitemap_urls$url, read_xml)
#creating an empty dataframe
all_sitemaps <- data.frame()
#saving each part of the list
x1 <- x[[1]] %>% xml_children() %>% xml_to_dataframe()
x2 <- x[[2]] %>% xml_children() %>% xml_to_dataframe()
x3 <- x[[3]] %>% xml_children() %>% xml_to_dataframe()
all_sitemaps <- rbind(x1,x2,x3)
xml_to_dataframe is a custom function that parses xml into a dataframe
xml_to_dataframe <- function(nodeset){
if(class(nodeset) != 'xml_nodeset'){
stop('Input should be "xml_nodeset" class')
}
lst <- lapply(nodeset, function(x){
tmp <- xml2::xml_text(xml2::xml_children(x))
names(tmp) <- xml2::xml_name(xml2::xml_children(x))
return(as.list(tmp))
})
result <- do.call(plyr::rbind.fill, lapply(lst, function(x)
as.data.frame(x, stringsAsFactors = F)))
return(dplyr::as_tibble(result))
}
Thank you very much for help