I'm working in a project with a large amount of tables that are stored in an HTML. In the process of scraping I'm having to deal with the following problem.
Some of the tables that I am scraping look like this
I had to input a fill = TRUE
parameter in this code for those lines that are merged cells ("chicken" and "chicken without bones"), at the time that I'm importing the DFs:
read_html(link) %>%
html_nodes(node) %>%
html_table(fill = T, header = T, dec = ",")
but this generated for me tables like this:
df <- data.frame(year = c("chicken",2000,2001,2002,"chicken without bones",2003,2004,2005, "chicken without bones and feet", 2006, 2007, 2008),
weight = c("chicken",5,6,4,"chicken without bones",2,1,3,"chicken without bones and feet", 1, 1.5, 2)
)
Trying to find a way to make my tables look this way:
df2 <- data.frame(year = c(2000,2001,2002, 2003, 2004, 2005,2006,2007, 2008), number = c(5,6,4,2,1,3,1,1.5, 2),
new_variable = c("chicken","chicken","chicken","chicken without bones","chicken without bones",
"chicken without bones","chicken without bones and feet","chicken without bones and feet","chicken without bones and feet" )
)
I'm struggling with R and still have no idea how to do this with my 1.028.974 tables scraped. Obs.: The tables doesn't have a pattern of this occurring; because of that I need a code that identifies the fill nodes, gets their values as characters and turns it into a new column values until the next fill happens.
Thanks for the attention !!