I am trying to extract data about celebrity/notable deaths for analysis. Wikipedia has a very regular structure to their html paths concerning notable dates of death. It looks like:
https://en.wikipedia.org/wiki/Deaths_in_"MONTH"_"YEAR"
For example, this link leads to the notable deaths in March, 2014.
https://en.wikipedia.org/wiki/Deaths_in_March_2014
I have located the CSS location of the lists I need to be ""#mw-content-text h3+ ul li" and extracted it for a specific link successfully. Now I'm trying to write a loop to go through the months and any years that I choose. I think it's a pretty straightforward nested loop but I'm getting errors when testing it just on 2015.
library(rvest)
data = data.frame()
mlist = c("January","February","March","April","May","June","July","August",
"September","October","November","December")
for (y in 2015:2015){
for (m in 1:12){
site = read_html(paste("https://en.wikipedia.org/wiki/Deaths_in_",mlist[m],
"_",y,collapse=""))
fnames = html_nodes(site,"#mw-content-text h3+ ul li")
text = html_text(fnames)
data = rbind(data,text,stringsAsFactors=FALSE)
}
}
When I comment out the line:
data = rbind(data,text,stringsAsFactors=FALSE)
no errors are returned so it's clearly related to this bit. I am posting my whole code for other comments as well. The goal here is to loop through many years and then focus on the distribution over the years and months. For this I just need to keep the age, month, and year of death.
Thank you!
EDIT: Sorry, they are technically warnings, not errors. I get over 50 of them and when I try to look at "data" it is a giant mess.
When I run this code not as a loop on one specific URL, it works fine and returns a readable output.
site = read_html("https://en.wikipedia.org/wiki/Deaths_in_January_2015")
fnames = html_nodes(site,"#mw-content-text h3+ ul li")
text = html_text(fnames)
Here are a couple of rows from that data set:
text[1:5]
[1] "Barbara Atkinson, 88, British actress (Z-Cars).[1]"
[2] "Staryl C. Austin, 94, American air force brigadier general.[2]"
[3] "Ulrich Beck, 70, German sociologist, heart attack.[3]"
[4] "Fiona Cumming, 77, British television director (Doctor Who).[4]"
[5] "Eric Cunningham, 65, Canadian politician, Ontario MPP for Wentworth North (1975–1984).[5]"