I'm trying to scrape this table titled Battle Styles into a dataframe. https://bulbapedia.bulbagarden.net/wiki/Battle_Styles_(TCG)#Set_lists
The problem is that many of the rows contain images with vital information which isn't being picked up in rvest.
The table should look like this:
No. Card name Type Rarity
001/163 Bellsprout Grass Common
002/163 Weepinbell Grass Uncommon
003/163 Victreebel Grass Rare
004/163 Cacnea Grass Common
005/163 Cacturne Grass Uncommon
006/163 KricketuneV Grass Ultra-Rare Rare
007/163 Cherubi Grass Common
008/163 Cherrim Grass Rare Holo
009/163 Carnivine Grass Uncommon
010/163 Durant Grass Uncommon
and this table ^^ is what I'm able to get if I copy the table and paste it into notepad.
However mine does not contain any information from the pictures. It looks like this:
# A tibble: 184 x 6
No. Image `Card name` Type Rarity Promotion
<chr> <lgl> <chr> <chr> <lgl> <chr>
1 001/163 NA Bellsprout "" NA Promotion
2 002/163 NA Weepinbell "" NA Promotion
3 003/163 NA Victreebel "" NA Promotion
4 004/163 NA Cacnea "" NA Promotion
5 005/163 NA Cacturne "" NA Promotion
6 006/163 NA Kricketune "" NA Promotion
7 007/163 NA Cherubi "" NA Promotion
8 008/163 NA Cherrim "" NA Promotion
9 009/163 NA Carnivine "" NA Promotion
10 010/163 NA Durant "" NA Promotion
The information necessary from pictures is in the alt-text, so I feel like the solution should be straight forward, but I can't figure out how to get it.
Here's my code:
library(rvest)
BattlestylesURL <- "https://bulbapedia.bulbagarden.net/wiki/Battle_Styles_(TCG)"
temp <- BattlestylesURL %>%
read_html %>%
html_nodes("table")
html_table(temp[16], fill = TRUE)
I think the biggest headache is that some columns combine images and text and I'm trying to have a dataframe with information from both in the same column. For example, the "Card Name" of row 6 is Kricketune V. 'Kricketune' is text, but the "V" is a picture.
I feel like there should be a simple way of doing it but I can't seem to wrap my head around it. Would greatly appreciate help!
The examples I've found have been similar: Scraping Wikipedia HTML table with images, text, and blank cells with R however, I couldn't figure out how to apply this to this situation because I'm trying to keep the text that was in the row too.